pacman::p_load(tidyverse, ggplot2, patchwork, ggalluvial, gridExtra, sf, sfdep, fmsb, tmap)Take-home Exercise 3
Navigating Jakarta’s Traffic Network Using Grab Posisi Data
1. Overview
In this Take-Home Exercise, I will be leveraging the Grab Posisi dataset to analyse Jakarta’s growing need for on-demand ride-hailing services like Grab, given its increasing pace of urbanisation, particularly in areas with high Points of Interests (POIs).
Feel free to move over to my team’s Netlify site for a more in-depth understanding of our project scope and our progress: https://is415-projectgrab-g2.netlify.app/
With that said, let’s begin with my contributions to the project. In this exercise, I will delve into two aspects of the project - exploratory data analysis (EDA) and exploratory spatial data analysis (ESDA) using Local Moran’s I and LISA classificationon the Grab dataset. My goal is to experiment with as many charts as possible, e.g. heat maps, flow maps, choropleths, to answer critical questions that will be useful in informing us of ride-hailing demand, traffic hotspots, and movement patterns around key locations in Jakarta. Eventually, the most useful charts will be selected in the final project.
2. Let’s Set Up!
2.1 Installing Required Packages
Firstly, let us begin by loading these requiring libraries into our R environment.
tidyverse: a collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structure.ggplot2: for creating advanced visualisations, graphics and maps using the Grammar of Graphics.patchworkand gridExtra: for arranging multiple ggplot2 maps beside each otherggalluvial: allows for building flow diagrams between origin and destination locations, similar to a sankey diagram.fmsb: for plotting radarcharts in Rtmap: for creating thematic maps with spatial data, providing custom styles, colors, legends, and interactivity
2.2 Importing Datasets
As mentioned, I will be using the Grab Posisi dataset for Jakarta which contains GPS pings from Grab vehicles, including timestamps, route data, and vehicle type (motorcycle/car).
In addition, my team will be delving into the district administration level to offer more meaningful analysis that is still computationally suitable for this project. We have also further augmented the dataset by including data categories such as, POIs, weather, population size.
Datasets used in this project
Jakarta Point of interests (POIs): https://data.humdata.org/dataset/hotosm_idn_points_of_interest
Weather API https://www.weatherbit.io/
Jakarta Population Density Data: https://storymaps.arcgis.com/stories/36e38ceefab0455eb6059a734381723c
Jakarta Map: https://data.humdata.org/dataset/cod-ab-idn
Grab Possi: https://engineering.grab.com/grab-posisi
2.2.1 Aspatial Data
trips <- readRDS("data/aspatial/trip_data.rds")
pois <- readRDS("data/aspatial/jakarta_pois.rds")
population <- read_csv("data/aspatial/jakarta_township_population.csv")Rows: 264 Columns: 5
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): province, city, district, township
dbl (1): population_2019
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
The trips tibble dataframe contains 55,995 unique Grab trips based on every 1 second GPS ping. This dataset also includes useful variables such as the driving_mode (car / motorcycle), total_duration_minutes, total_distance_km, location data (origin_district & destination_district) and time data (origin_day & origin_hour). I will be most interested in knowing where a Grab trip ends, i.e. its destination district, across which periods of the day and week.
glimpse(trips)Rows: 55,995
Columns: 32
$ trj_id <chr> "1", "10000", "10002", "10003", "1…
$ driving_mode <chr> "car", "motorcycle", "motorcycle",…
$ origin_time <dttm> 2019-04-11 14:17:35, 2019-04-16 0…
$ destination_time <dttm> 2019-04-11 14:36:24, 2019-04-16 0…
$ total_duration_minutes <dbl> 18.816667, 9.450000, 21.300000, 32…
$ total_distance_km <dbl> 5.943848, 2.809531, 8.040161, 16.8…
$ average_speed_kmh <dbl> 18.952926, 17.838289, 22.648340, 3…
$ origin_rawlat <dbl> -6.197622, -6.248311, -6.249766, -…
$ origin_rawlng <dbl> 106.7690, 106.9304, 106.9682, 106.…
$ destination_rawlat <dbl> -6.239817, -6.229177, -6.235125, -…
$ destination_rawlng <dbl> 106.8019, 106.9470, 106.8970, 106.…
$ origin_lat <dbl> -2939149, -2993318, -3001902, -295…
$ origin_lng <dbl> 13499453, 13554146, 13569122, 1355…
$ destination_lat <dbl> -2962961, -2989069, -2980935, -300…
$ destination_lng <dbl> 13504006, 13564901, 13543389, 1350…
$ origin_province <chr> "jakarta", "jakarta", "outside of …
$ origin_city <chr> "kota jakarta barat", "kota jakart…
$ origin_district <chr> "kebon jeruk", "duren sawit", "out…
$ destination_province <chr> "jakarta", "outside of jakarta", "…
$ destination_city <chr> "kota jakarta selatan", "outside o…
$ destination_district <chr> "kebayoran baru", "outside of jaka…
$ origin_datetime <dttm> 2019-04-11 14:17:35, 2019-04-16 0…
$ destination_datetime <dttm> 2019-04-11 14:36:24, 2019-04-16 0…
$ origin_day <chr> "thursday", "tuesday", "monday", "…
$ origin_hour <dbl> 14, 0, 5, 10, 23, 7, 4, 4, 2, 10, …
$ destination_day <chr> "thursday", "tuesday", "monday", "…
$ destination_hour <dbl> 14, 1, 6, 11, 23, 7, 4, 4, 4, 10, …
$ origin_time_cluster <chr> "afternoon lull", "midnight peak",…
$ destination_time_cluster <chr> "afternoon lull", "midnight peak",…
$ origin_date <date> 2019-04-11, 2019-04-16, 2019-04-0…
$ origin_weather_description <chr> "broken clouds", "scattered clouds…
$ origin_weather_description_category <chr> "not_rain", "not_rain", "Outside o…
Grab trips from outside of Jakarta and moving out of Jakarta are attributed to the outer islands of Jakarta, namely the Kepulauan Seribu Regency, which are chains of islands in the North of Jakarta’s coasts.
This dataset contains all unique 9,685 POIs found within each district of Jakarta. This will give us insights on how Grab is demanded based on the availability of POIs across different districts.
glimpse(pois)Rows: 9,685
Columns: 11
$ poi_name <chr> "TK Sekolah Kemurnian I", "PAUD Melati X", "Posyandu Cempaka"…
$ province <chr> "jakarta", "jakarta", "jakarta", "jakarta", "jakarta", "jakar…
$ city <chr> "kota jakarta barat", "kota jakarta timur", "kota jakarta tim…
$ district <chr> "taman sari", "pasar rebo", "pasar rebo", "cilandak", "koja",…
$ category <chr> "Facilities_Services", "Facilities_Services", "Essentials", "…
$ poi_type <chr> "kindergarten", "kindergarten", "clinic", "kindergarten", "co…
$ geometry <POINT [m]> POINT (13527881 -2927558), POINT (13508901 -3013973), P…
$ lat_4326 <dbl> -6.146429, -6.334694, -6.331832, -6.313693, -6.124969, -6.340…
$ lng_4326 <dbl> 106.8132, 106.8634, 106.8456, 106.7959, 106.9196, 106.8066, 1…
$ lat_5580 <dbl> -2927558, -3013973, -3009066, -2991388, -2940921, -3004218, -…
$ lng_5580 <dbl> 13527881, 13508901, 13502345, 13486227, 13575632, 13484988, 1…
Thirdly, the population dataframe contains all population counts in 2019 down to the township level. We will need to perform some data wrangling to summarise it to the district level later.
glimpse(population)Rows: 264
Columns: 5
$ province <chr> "Jakarta", "Jakarta", "Jakarta", "Jakarta", "Jakarta",…
$ city <chr> "Kota Jakarta Utara", "Kota Jakarta Barat", "Kota Jaka…
$ district <chr> "Pademangan", "Tambora", "Kramat Jati", "Jatinegara", …
$ township <chr> "ancol", "angke", "bale kambang", "bali mester", "bamb…
$ population_2019 <dbl> 4921, 1545, 3774, 2762, 2667, 1724, 2938, 3774, 982, 2…
2.2.2 Geospatial Data
Next, let’s proceed with producing the jakarta_district simple feature dataframe consisting of the district boundary of Jakarta by reading from the idn_admbnda_adm3_bps_20200401 shapefile below.
# Step 1: Read the Indonesia administrative boundary shapefile
indonesia <- st_read(
dsn = "data/geospatial/indon",
layer = "idn_admbnda_adm3_bps_20200401"
)Reading layer `idn_admbnda_adm3_bps_20200401' from data source
`C:\SamanthaxFoo\IS415-GAA\Take-home_Ex\Take-home_Ex3\data\geospatial\indon'
using driver `ESRI Shapefile'
Simple feature collection with 7069 features and 16 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 95.01079 ymin: -11.00762 xmax: 141.0194 ymax: 6.07693
Geodetic CRS: WGS 84
# Step 2: Filter for DKI Jakarta and rename it to "Jakarta"
jakarta_district <- indonesia %>%
filter(ADM1_EN == "Dki Jakarta") %>%
mutate(ADM1_EN = "Jakarta")
# Step 3: Exclude districts belonging to "Kepulauan Seribu"
jakarta_district <- jakarta_district %>%
filter(ADM2_EN != "Kepulauan Seribu") # Exclude Kepulauan Seribu
# Step 4: Select only the required columns
jakarta_district <- jakarta_district %>%
dplyr::select(ADM1_EN, ADM2_EN, ADM3_EN, geometry)
# Step 5: Rename columns to more meaningful names
jakarta_district <- jakarta_district %>%
rename(
province = ADM1_EN,
city = ADM2_EN,
district = ADM3_EN,
)
# Step 6: Ensure the CRS is EPSG:4326 (WGS84)
jakarta_district <- jakarta_district %>%
st_transform(crs = 5580) # Transform to WGS84Warning in CPL_crs_from_input(x): GDAL Message 1: CRS EPSG:5580 is deprecated.
Its non-deprecated replacement EPSG:6384 will be used instead. To use the
original CRS, set the OSR_USE_NON_DEPRECATED configuration option to NO.
jakarta_district <- jakarta_district %>%
mutate(across(where(is.character), tolower))
jakarta_district_df <- jakarta_district %>%
st_drop_geometry()
# Step 7: Simplify the geometry with a smaller tolerance
jakarta_district <- jakarta_district %>%
st_simplify(dTolerance = 10.0) # Smaller tolerance for longitude/latitude dataclass(jakarta_district)[1] "sf" "data.frame"
Next, we can plot the interactive map of Jakarta and their districts using tmap and OpenStreetMap, which plots all of its 44 districts and their boundaries as shown below.
Plot the interactive map using tmap
tmap_mode("view")
tm_shape(jakarta_district) +
tm_polygons(
col = "district",
palette = "Blues",
border.col = "black",
lwd = 0.5,
popup.vars = "district"
) +
tm_text("district", size = 0.6, col = "black") +
tm_basemap("OpenStreetMap")Plot the interactive map using tmap
tmap_mode("plot") 3. Data Wrangling
3.1 Categorising Point of Interests (POIs) Data
I assisted my team mate, Jia Le, with handling some of the data cleaning process of the project. This includes re-categorising the 205 POIs identified for each district into its correct category as the initial categories were previously classified wrongly.
There are a total of 9 unique categories that needed some re-categorisation.
- Facilities_Services
- Essentials
- Offices_Business
- Cultural_Attractions
- Restaurants_Food
- Recreation_Entertainment
- Others
- Shops
- Tourism_Spots
For instance, the ‘taxi’ POI was initially classified as ‘Others’ instead of something more related like ‘Facilities_Services’.

3.2 Aggregate the Population Dataset
It is worth noting that we have two lakes that are considered districts here - Danau Sunter and Danau Sunter DII - which explains why they have 0 population size. We also do not have population data for districts outside of Jakarta here.
population <- population %>%
group_by(district) %>%
summarise(population_count = sum(population_2019)) %>%
mutate(district = tolower(district))
# Inspect
population# A tibble: 44 × 2
district population_count
<chr> <dbl>
1 cakung 38710
2 cempaka putih 7440
3 cengkareng 40470
4 cilandak 16880
5 cilincing 37471
6 cipayung 21336
7 ciracas 30015
8 danau sunter 0
9 danau sunter dll 0
10 duren sawit 38150
# ℹ 34 more rows
3.3 Prepare the Augmented POIs Dataset
For a more useful analysis, we can break it down by POI category types and get the count of POIs within each category for each district, as shown in the pois_num tibble below.
# Count the number of POIs by district
pois_num <- pois %>%
st_drop_geometry() %>%
group_by(district) %>%
summarise(num_of_pois = n(), .groups = 'drop')
# Inspect
pois_num# A tibble: 44 × 2
district num_of_pois
<chr> <int>
1 cakung 56
2 cempaka putih 38
3 cengkareng 298
4 cilandak 242
5 cilincing 119
6 cipayung 56
7 ciracas 126
8 danau sunter 1
9 danau sunter dll 3
10 duren sawit 173
# ℹ 34 more rows
3.4 Prepare the Augmented Trips Dataset
I noticed that we have records where destination districts are found outside of Jakarta’s boundary. For the purpose of analysing Grab trends within Jakarta, I will filter these trip records out.
trips <- trips %>%
filter(destination_district != "outside of jakarta" |
origin_district != "outside of jakarta")
# Inspect
head(trips)# A tibble: 6 × 32
trj_id driving_mode origin_time destination_time
<chr> <chr> <dttm> <dttm>
1 1 car 2019-04-11 14:17:35 2019-04-11 14:36:24
2 10000 motorcycle 2019-04-16 00:51:24 2019-04-16 01:00:51
3 10002 motorcycle 2019-04-08 05:55:41 2019-04-08 06:16:59
4 10003 car 2019-04-19 10:52:34 2019-04-19 11:24:44
5 10007 car 2019-04-11 07:05:00 2019-04-11 07:28:56
6 10012 car 2019-04-20 02:38:54 2019-04-20 04:05:16
# ℹ 28 more variables: total_duration_minutes <dbl>, total_distance_km <dbl>,
# average_speed_kmh <dbl>, origin_rawlat <dbl>, origin_rawlng <dbl>,
# destination_rawlat <dbl>, destination_rawlng <dbl>, origin_lat <dbl>,
# origin_lng <dbl>, destination_lat <dbl>, destination_lng <dbl>,
# origin_province <chr>, origin_city <chr>, origin_district <chr>,
# destination_province <chr>, destination_city <chr>,
# destination_district <chr>, origin_datetime <dttm>, …
Next, I will leverage the trips tibble dataframe to group all trips by destination_district and summarise them by number of trips, average duration (minutes) and average distance travelled (km).
trips_dest <- trips %>%
group_by(destination_district) %>%
summarise(
num_of_trips = n(),
avg_duration_minutes = mean(total_duration_minutes, na.rm = TRUE),
avg_distance_km = mean(total_distance_km, na.rm = TRUE)
) %>%
rename(district = destination_district) %>%
mutate(district = tolower(district))
trips_origin <- trips %>%
group_by(origin_district) %>%
summarise(
num_of_trips = n(),
avg_duration_minutes = mean(total_duration_minutes, na.rm = TRUE),
avg_distance_km = mean(total_distance_km, na.rm = TRUE)
) %>%
rename(district = origin_district) %>%
mutate(district = tolower(district))I will also join the pois_num dataframe to trips_dest and trips_origin by the district column so we can append the total number of POIs for each destination district.
trips_dest_pois <- trips_dest %>%
inner_join(pois_num, by = "district")
trips_origin_pois <- trips_origin %>%
inner_join(pois_num, by = "district")Let’s also include the population data from the population dataframe we imported into R.
# For the destination
district_dest <- trips_dest_pois %>%
left_join(population %>% mutate(district = tolower(district)),
by = "district")
district_dest# A tibble: 44 × 6
district num_of_trips avg_duration_minutes avg_distance_km num_of_pois
<chr> <int> <dbl> <dbl> <int>
1 cakung 844 21.5 6.40 56
2 cempaka putih 392 19.1 6.13 38
3 cengkareng 1120 19.9 6.20 298
4 cilandak 905 20.1 5.78 242
5 cilincing 277 24.4 7.24 119
6 cipayung 552 22.5 7.08 56
7 ciracas 546 22.6 7.03 126
8 danau sunter 17 17.3 5.21 1
9 danau sunter d… 33 16.9 5.00 3
10 duren sawit 877 21.1 6.32 173
# ℹ 34 more rows
# ℹ 1 more variable: population_count <dbl>
# For the origin
district_origin <- trips_origin_pois %>%
left_join(population %>% mutate(district = tolower(district)),
by = "district")
district_origin# A tibble: 44 × 6
district num_of_trips avg_duration_minutes avg_distance_km num_of_pois
<chr> <int> <dbl> <dbl> <int>
1 cakung 628 22.2 6.56 56
2 cempaka putih 460 17.9 5.52 38
3 cengkareng 928 20.2 6.00 298
4 cilandak 830 20.4 6.09 242
5 cilincing 266 24.7 7.30 119
6 cipayung 481 22.9 7.09 56
7 ciracas 607 23.1 7.38 126
8 danau sunter 4 19.4 8.55 1
9 danau sunter d… 34 21.3 7.54 3
10 duren sawit 891 21.3 6.39 173
# ℹ 34 more rows
# ℹ 1 more variable: population_count <dbl>
3.5 Prepare the Dataframe for Weather by Origin District
For a more aggregated analysis of the weather conditions influencing commuters to pay for Grab services, I will do a count of all number of trips based on each weather description (e.g. fog, haze, heavy rain) and based on each weather category (e.g. rain or no rain). Take note that I will remove all values returned as “outside of jakarta” since we do not have weather data for trip origins found outside of Jakarta.
weather_descriptions <- c(
"broken clouds", "scattered clouds",
"light rain", "overcast clouds", "moderate rain",
"haze", "fog", "few clouds", "heavy rain"
)
trips_origin_weather <- trips %>%
filter(origin_weather_description %in% weather_descriptions) %>%
group_by(origin_district, origin_weather_description) %>%
summarise(num_of_trips = n(), .groups = 'drop') %>%
pivot_wider(
names_from = origin_weather_description,
values_from = num_of_trips,
values_fill = list(num_of_trips = 0)
) %>%
rename(district = origin_district) %>%
mutate(district = tolower(district))
category_counts <- trips %>%
filter(origin_weather_description %in% weather_descriptions) %>%
group_by(origin_district, origin_weather_description_category) %>%
summarise(total_category_count = n(), .groups = 'drop') %>%
pivot_wider(
names_from = origin_weather_description_category,
values_from = total_category_count,
values_fill = list(total_category_count = 0)
) %>%
rename(district = origin_district) %>%
mutate(district = tolower(district))
trips_origin_weather <- trips_origin_weather %>%
left_join(category_counts, by = "district")
trips_origin_weather# A tibble: 44 × 12
district `broken clouds` fog haze `light rain` `moderate rain`
<chr> <int> <int> <int> <int> <int>
1 cakung 159 18 25 58 4
2 cempaka putih 334 0 0 44 2
3 cengkareng 374 0 8 67 4
4 cilandak 228 22 26 76 8
5 cilincing 110 0 0 26 1
6 cipayung 132 9 16 61 3
7 ciracas 156 18 26 64 13
8 danau sunter 3 0 0 0 0
9 danau sunter dll 24 0 0 1 0
10 duren sawit 225 23 26 80 3
# ℹ 34 more rows
# ℹ 6 more variables: `scattered clouds` <int>, `overcast clouds` <int>,
# `few clouds` <int>, `heavy rain` <int>, not_rain <int>, rain <int>
Let’s do an inner_join() to combine the weather data with our district_origin dataframe.
district_origin <- district_origin %>%
left_join(trips_origin_weather, by = "district") %>%
select(-ends_with(".x"), -ends_with(".y"))
# Inspect
district_origin# A tibble: 44 × 17
district num_of_trips avg_duration_minutes avg_distance_km num_of_pois
<chr> <int> <dbl> <dbl> <int>
1 cakung 628 22.2 6.56 56
2 cempaka putih 460 17.9 5.52 38
3 cengkareng 928 20.2 6.00 298
4 cilandak 830 20.4 6.09 242
5 cilincing 266 24.7 7.30 119
6 cipayung 481 22.9 7.09 56
7 ciracas 607 23.1 7.38 126
8 danau sunter 4 19.4 8.55 1
9 danau sunter d… 34 21.3 7.54 3
10 duren sawit 891 21.3 6.39 173
# ℹ 34 more rows
# ℹ 12 more variables: population_count <dbl>, `broken clouds` <int>,
# fog <int>, haze <int>, `light rain` <int>, `moderate rain` <int>,
# `scattered clouds` <int>, `overcast clouds` <int>, `few clouds` <int>,
# `heavy rain` <int>, not_rain <int>, rain <int>
3.6 Calculate Centroid of Each District’s Polygon
Finally, I will calculate the centroid of each polygon in jakarta_district and add the latitude and longitude coordinates of these centroids as new columns. We can also remove the province and city columns.
st_centroid(geometry): Calculates the centroid of each polygon in thegeometrycolumn ofjakarta_district, returning a point that represents the center of each district’s shape.centroid_latandcentroid_lng:st_coordinates(centroid)[, 2]: Extracts the latitude (y-coordinate) from each centroid and stores it incentroid_lat.st_coordinates(centroid)[, 1]: Extracts the longitude (x-coordinate) from each centroid and stores it incentroid_lng.
jakarta_district_centroid <- jakarta_district %>%
mutate(
geometry = st_centroid(geometry),
centroid_lat = st_coordinates(geometry)[, 2],
centroid_lng = st_coordinates(geometry)[, 1]
) %>%
select(district, centroid_lat, centroid_lng, geometry) # Select the necessary columns
# Display the result
jakarta_district_centroidSimple feature collection with 44 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 13484360 ymin: -3018643 xmax: 13585050 ymax: -2900311
Projected CRS: UCS-2000 / Ukraine TM zone 10
First 10 features:
district centroid_lat centroid_lng geometry
1 cakung -2971241 13570750 POINT (13570750 -2971241)
2 cempaka putih -2952903 13543000 POINT (13543000 -2952903)
3 cengkareng -2914110 13495544 POINT (13495544 -2914110)
4 cilandak -2982437 13489551 POINT (13489551 -2982437)
5 cilincing -2946831 13585052 POINT (13585052 -2946831)
6 cipayung -3018643 13526977 POINT (13526977 -3018643)
7 ciracas -3016628 13515117 POINT (13515117 -3016628)
8 danau sunter -2938886 13545267 POINT (13545267 -2938886)
9 danau sunter dll -2935989 13540105 POINT (13540105 -2935989)
10 duren sawit -2984795 13553448 POINT (13553448 -2984795)
4. Exploratory Data Analysis (EDA)
In this EDA process, I will leverage the datasets we’ve imported and wrangled to identify charts that produce the most interesting insights in terms of traffic pattersn, demand for Grab trends, temporal trends and POI analysis.
4.1 Traffic Trends & Demand Distribution
4.1.1 Distribution of Trips Origin and Destination By District
In terms of trip origins, districts such as Setia Budi, Grogol Petamburan, and Kebayoran Baru have the highest traffic volumes moving out of the district, indicating areas of high trip demand. In terms of trip destinations, districts such as Tanah A bang and Setia Budi are most popular for Grab trips. In contrast, districts like Johar Baru, Dananu Sunter DII, and Danau Sunter experience the lowest traffic volumes, suggesting potential areas of under-service or potentially low demand for Grab services.
# Count trips for origin districts
origin_counts <- trips %>%
count(origin_district) %>%
rename(total_trips = n)
# Count trips for destination districts
destination_counts <- trips %>%
count(destination_district) %>%
rename(total_trips = n)
# Create a bar plot for origin districts
p1 <- ggplot(origin_counts, aes(x = reorder(origin_district, -total_trips), y = total_trips)) +
geom_bar(stat = "identity", fill = "skyblue2") +
theme_gray() +
labs(title = "Total Number of Grab Trips per Origin District",
x = "Origin District",
y = "Number of Trips") +
theme(
plot.title = element_text(hjust = 0.5, size = 14, face = "bold"), # Center and style the title
axis.text.x = element_text(angle = 45, hjust = 1)
)
# Create a bar plot for destination districts
p2 <- ggplot(destination_counts, aes(x = reorder(destination_district, -total_trips), y = total_trips)) +
geom_bar(stat = "identity", fill = "pink2") +
theme_gray() +
labs(title = "Total Number of Grab Trips per Destination District",
x = "Destination District",
y = "Number of Trips") +
theme(
plot.title = element_text(hjust = 0.5, size = 14, face = "bold"), # Center and style the title
axis.text.x = element_text(angle = 45, hjust = 1)
)
p1 / p2 
We can also inspect how the population size of each district influences the number of trips taken to and from each district. Generally, districts with a high number of trips per capita from origin also displayed more frequent trips taken into the district.
Initially, there are districts with the highest trips per capita but in reality, they do NOT correspond to high number of trips (previous chart). For instance, Danau Sunter DII and Danau Sunter are districts with the lowest number of trips but since they also have a population of 0, resulting in a seemingly high number of trips per capita. Hence, we remove trip data from these two districts since it is not possible to divide a value by 0!
Prepare data
destination_data <- district_dest %>%
select(district, population_count, num_of_trips) %>%
filter(district != "Danau Sunter Dll" & district != "Danau Sunter") %>%
mutate(trips_per_capita_dest = num_of_trips / population_count) %>%
rename(district = district)
origin_data <- district_origin %>%
select(district, population_count, num_of_trips) %>%
filter(district != "Danau Sunter Dll" & district != "Danau Sunter") %>%
mutate(trips_per_capita_origin = num_of_trips / population_count) %>%
rename(district = district)
combined_data <- destination_data %>%
select(district, trips_per_capita = trips_per_capita_dest) %>%
mutate(trip_type = "Destination") %>%
bind_rows(
origin_data %>%
select(district, trips_per_capita = trips_per_capita_origin) %>%
mutate(trip_type = "Origin")
)# Create the line plot for trips per capita with specified colors and theme
ggplot(combined_data, aes(x = district, y = trips_per_capita, color = trip_type, group = trip_type)) +
geom_line(size = 1.2) +
geom_point(size = 2) +
labs(
title = "Number of Trips Per Capita by District",
x = "District",
y = "Trips Per Capita"
) +
theme_gray() +
theme(
plot.title = element_text(size = 16, face = "bold", hjust = 0.5),
axis.text.x = element_text(angle = 90, hjust = 1)
) +
scale_color_manual(values = c("Origin" = "lightblue3", "Destination" = "pink3"))Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.

Instead, we can just separate the number of trips and the population size as such. Generally, we do not see any trends, meaning that the population size has no effect on the number of trips taken.
Prepare data
district_data <- district_dest %>%
select(district, population_count, num_of_trips) %>%
rename(trips = num_of_trips)
district_data <- district_data %>%
bind_rows(
district_origin %>%
select(district, population_count, num_of_trips) %>%
rename(trips = num_of_trips)
) %>%
group_by(district) %>%
summarise(
population_count = mean(population_count, na.rm = TRUE),
total_trips = mean(trips, na.rm = TRUE)
) %>%
arrange(desc(population_count))ggplot(district_data) +
geom_line(aes(x = reorder(district, population_count), y = total_trips, group = 1, color = "Average Trips Taken"), size = 1.2) + # Updated label
geom_line(aes(x = reorder(district, population_count), y = population_count * max(total_trips) / max(population_count), group = 2, color = "Population Count"), size = 1.2) +
geom_point(aes(x = reorder(district, population_count), y = total_trips, color = "Average Trips Taken"), size = 3) + # Updated label
geom_point(aes(x = reorder(district, population_count), y = population_count * max(total_trips) / max(population_count), color = "Population Count"), size = 3) +
labs(
title = "Mean Number of Trips and Population Count by District",
x = "District",
y = "Mean Number of Trips",
color = "Legend"
) +
theme_gray() +
theme(
plot.title = element_text(size = 16, face = "bold", hjust = 0.5), # Center, bold, size 16
axis.text.x = element_text(angle = 90, hjust = 1),
legend.position = "top"
) +
scale_color_manual(values = c("Average Trips Taken" = "lightblue3", "Population Count" = "pink3")) + # Updated color mapping
scale_y_continuous(
sec.axis = sec_axis(~ . * max(district_data$population_count) / max(district_data$total_trips), name = "Population Count") # Secondary axis
)
4.1.2 Top Origins and Destinations for Ride-hailing Trips
Next, we will prepare the top 5 districts with the most number of Grab trips based on trip origin and trip destinations.
# Calculate top 5 districts for origin_district
top_5_origin_districts <- trips %>%
count(origin_district) %>%
rename(total_trips = n) %>%
arrange(desc(total_trips)) %>%
slice(1:5)
# Calculate top 5 districts for destination_district
top_5_destination_districts <- trips %>%
count(destination_district) %>%
rename(total_trips = n) %>%
arrange(desc(total_trips)) %>%
slice(1:5)By plotting a bar chart, we can identify districts with the most demand for Grab services for both in-flow and out-flow of Grab trips. In particular, we can see that districts with most Grab trips originating from it corresponds with the highest number of Grab trips arriving into it (e.g. Setia Budi).
Such alignment suggests that these districts function as key hubs of movement within the city, likely due to high-density residential, commercial, or mixed-use areas.
# Plot for Top 5 Origin Districts
p1 <- ggplot(top_5_origin_districts, aes(y = reorder(origin_district, total_trips), x = total_trips)) +
geom_bar(stat = "identity", fill = "skyblue") +
theme_gray() +
labs(title = "Most Popular Origin Districts", y = "District", x = "Number of Trips") +
theme(
plot.title = element_text(hjust = 0.5, size = 14, face = "bold")
)
# Plot for Top 5 Destination Districts
p2 <- ggplot(top_5_destination_districts, aes(y = reorder(destination_district, total_trips), x = total_trips)) +
geom_bar(stat = "identity", fill = "pink2") +
theme_gray() +
labs(title = "Most Popular Destination Districts", y = "District", x = "Number of Trips") +
theme(
plot.title = element_text(hjust = 0.5, size = 14, face = "bold")
)
p1 + p2
4.1.3 Least popular origins and destinations for ride-hailing trips
We can also gather origin and destination districts with the least trips by arranging the total_trips in descending order and filtering districts with the lowest 5 number of trips.
Prepare data
orig_count <- trips %>%
count(origin_district) %>%
rename(trips = n) %>%
arrange(trips)
dest_count <- trips %>%
count(destination_district) %>%
rename(trips = n) %>%
arrange(trips)
bottom_5_orig <- orig_count %>%
slice_head(n = 5)
bottom_5_dest <- dest_count %>%
slice_head(n = 5)
bottom_5_orig$origin_district <- factor(bottom_5_orig$origin_district,
levels = bottom_5_orig$origin_district)
bottom_5_dest$destination_district <- factor(bottom_5_dest$destination_district,
levels = bottom_5_dest$destination_district)Here I plot the geom_bar() of the ggplot2 package. The results aligns with what we mentioned earlier, where the same origin districts with the least out-flow of Grab trips also happen to be districts with the least in-flow of Grab trips.
p1 <- ggplot(bottom_5_orig, aes(y = origin_district, x = trips)) +
geom_bar(stat = "identity", fill = "lightblue") +
labs(title = "Least Popular Origin Districts", y = "District", x = "Trips") +
theme_gray() +
theme(plot.title = element_text(hjust = 0.5, size = 14, face = "bold"))
p2 <- ggplot(bottom_5_dest, aes(y = destination_district, x = trips)) +
geom_bar(stat = "identity", fill = "pink2") +
labs(title = "Least Popular Destination Districts", y = "District", x = "Trips") +
theme_gray() +
theme(plot.title = element_text(hjust = 0.5, size = 14, face = "bold"))
p1 + p2
4.1.4 Popularity of Districts Based on Driving Mode
When using the geom_density() function to create density plots, we can see a smoothed version of the trip distance (in kilometres), further drilled down by the mode of driving (car vs motorcycle). There is almost no difference in relative distance travelled by cars or motorcycles.
For longer distances traveled, cars are slightly more preferredthan motorcycles, while for shorter trip distances, both driving modes are equally demanded.Thus, we only see some instances where cars are preferred for when journeys are longer.
ggplot(trips, aes(x = total_distance_km, fill = driving_mode)) +
geom_density(alpha = 0.8) +
labs(title = "Trip Distance (km) by Driving Mode (All Trips)",
x = "Total Distance (km)",
y = "Density") +
theme_gray() +
theme(plot.title = element_text(hjust = 0.5, size = 14, face = "bold")) +
scale_fill_manual(values = c("lightblue", "pink2"))
This is the adjusted plot when we completely remove all trips with origins and destinations outside of Jakarta. The distance of Grab journeys significantly decrease from 80km to ~20km, and observations of popularity of the driving mode remains the same.
trips_inside_jakarta <- trips %>%
filter(origin_district != 'Outside of Jakarta', destination_district != 'Outside of Jakarta')
ggplot(trips_inside_jakarta, aes(x = total_distance_km, fill = driving_mode)) +
geom_density(alpha = 0.8) +
labs(title = "Trip Distance (km) by Driving Mode (Within Jakarta)",
x = "Total Distance (km)",
y = "Density") +
theme_gray() +
theme(plot.title = element_text(hjust = 0.5, size = 14, face = "bold")) +
scale_fill_manual(values = c("lightblue", "pink2"))
We can further break this down into popular origins and destinations for car and motorcycle drivers respectively. It is rather intriguing that both modes of driving have similar top origin and destinations, as well as least popular origin and destinations travelled. Once again, indicating how both vehicle types are equally demanded in Jakarta.
1) For Car Drivers
# Filter trips for cars only
car_trips <- trips %>%
filter(driving_mode == 'car')
# Calculate counts for origin districts
origin_counts <- car_trips %>%
count(origin_district) %>%
rename(total_trips = n) %>%
arrange(desc(total_trips))
# Get top 5 origin districts
top_5_origin_districts <- origin_counts %>%
slice_head(n = 5)
# Get least 5 origin districts
bottom_5_origin_districts <- origin_counts %>%
slice_tail(n = 5)
# Set factor levels for top and bottom origin districts
top_5_origin_districts$origin_district <- factor(top_5_origin_districts$origin_district,
levels = top_5_origin_districts$origin_district)
bottom_5_origin_districts$origin_district <- factor(bottom_5_origin_districts$origin_district,
levels = bottom_5_origin_districts$origin_district)
# Plot for Top 5 Origin Districts in descending order
p1 <- ggplot(top_5_origin_districts, aes(y = reorder(origin_district, total_trips), x = total_trips)) +
geom_bar(stat = "identity", fill = "green3") +
theme_gray() +
labs(title = "Top 5 Popular Origin Districts (Cars)", y = "District", x = "Number of Trips") +
theme(
plot.title = element_text(hjust = 0.5, size = 14, face = "bold")
)
# Plot for Least 5 Origin Districts
p2 <- ggplot(bottom_5_origin_districts, aes(y = origin_district, x = total_trips)) +
geom_bar(stat = "identity", fill = "lightcoral") +
theme_gray() +
labs(title = "Least 5 Popular Origin Districts (Cars)", y = "District", x = "Number of Trips") +
theme(
plot.title = element_text(hjust = 0.5, size = 14, face = "bold")
)
# Calculate counts for destination districts
destination_counts <- car_trips %>%
count(destination_district) %>%
rename(total_trips = n) %>%
arrange(desc(total_trips))
# Get top 5 destination districts
top_5_destination_districts <- destination_counts %>%
slice_head(n = 5)
# Get least 5 destination districts
bottom_5_destination_districts <- destination_counts %>%
slice_tail(n = 5)
# Set factor levels for top and bottom destination districts
top_5_destination_districts$destination_district <- factor(top_5_destination_districts$destination_district,
levels = top_5_destination_districts$destination_district)
bottom_5_destination_districts$destination_district <- factor(bottom_5_destination_districts$destination_district,
levels = bottom_5_destination_districts$destination_district)
# Plot for Top 5 Destination Districts
p3 <- ggplot(top_5_destination_districts, aes(y = reorder(destination_district, total_trips), x = total_trips)) +
geom_bar(stat = "identity", fill = "lightblue") +
theme_gray() +
labs(title = "Top 5 Popular Destination Districts (Cars)", y = "District", x = "Number of Trips") +
theme(
plot.title = element_text(hjust = 0.5, size = 14, face = "bold")
)
# Plot for Least 5 Destination Districts
p4 <- ggplot(bottom_5_destination_districts, aes(y = destination_district, x = total_trips)) +
geom_bar(stat = "identity", fill = "pink2") +
theme_gray() +
labs(title = "Least 5 Popular Destination Districts (Cars)", y = "District", x = "Number of Trips") +
theme(
plot.title = element_text(hjust = 0.5, size = 14, face = "bold")
)
grid.arrange(p1, p2, p3, p4, ncol = 2)
2) For Motorcycle Drivers
Code
# Filter trips for motorcycles only
motorcycle_trips <- trips %>%
filter(driving_mode == 'motorcycle')
# Calculate counts for origin districts
origin_counts <- motorcycle_trips %>%
count(origin_district) %>%
rename(total_trips = n) %>%
arrange(desc(total_trips))
# Get top 5 origin districts
top_5_origin_districts <- origin_counts %>%
slice_head(n = 5)
# Get least 5 origin districts
bottom_5_origin_districts <- origin_counts %>%
slice_tail(n = 5)
# Set factor levels for top and bottom origin districts
top_5_origin_districts$origin_district <- factor(top_5_origin_districts$origin_district,
levels = top_5_origin_districts$origin_district)
bottom_5_origin_districts$origin_district <- factor(bottom_5_origin_districts$origin_district,
levels = bottom_5_origin_districts$origin_district)
# Plot for Top 5 Origin Districts in descending order
p1 <- ggplot(top_5_origin_districts, aes(y = reorder(origin_district, total_trips), x = total_trips)) +
geom_bar(stat = "identity", fill = "green3") +
theme_gray() +
labs(title = "Top 5 Popular Origin Districts (Motorcycles)", y = "District", x = "Number of Trips") +
theme(
plot.title = element_text(hjust = 0.5, size = 14, face = "bold")
)
# Plot for Least 5 Origin Districts
p2 <- ggplot(bottom_5_origin_districts, aes(y = origin_district, x = total_trips)) +
geom_bar(stat = "identity", fill = "lightcoral") +
theme_gray() +
labs(title = "Least 5 Popular Origin Districts (Motorcycles)", y = "District", x = "Number of Trips") +
theme(
plot.title = element_text(hjust = 0.5, size = 14, face = "bold")
)
# Calculate counts for destination districts
destination_counts <- motorcycle_trips %>%
count(destination_district) %>%
rename(total_trips = n) %>%
arrange(desc(total_trips))
# Get top 5 destination districts
top_5_destination_districts <- destination_counts %>%
slice_head(n = 5)
# Get least 5 destination districts
bottom_5_destination_districts <- destination_counts %>%
slice_tail(n = 5)
# Set factor levels for top and bottom destination districts
top_5_destination_districts$destination_district <- factor(top_5_destination_districts$destination_district,
levels = top_5_destination_districts$destination_district)
bottom_5_destination_districts$destination_district <- factor(bottom_5_destination_districts$destination_district,
levels = bottom_5_destination_districts$destination_district)
# Plot for Top 5 Destination Districts
p3 <- ggplot(top_5_destination_districts, aes(y = reorder(destination_district, total_trips), x = total_trips)) +
geom_bar(stat = "identity", fill = "lightblue") +
theme_gray() +
labs(title = "Top 5 Popular Destination Districts (Motorcycles)", y = "District", x = "Number of Trips") +
theme(
plot.title = element_text(hjust = 0.5, size = 14, face = "bold")
)
# Plot for Least 5 Destination Districts
p4 <- ggplot(bottom_5_destination_districts, aes(y = destination_district, x = total_trips)) +
geom_bar(stat = "identity", fill = "pink2") +
theme_gray() +
labs(title = "Least 5 Popular Destination Districts (Motorcycles)", y = "District", x = "Number of Trips") +
theme(
plot.title = element_text(hjust = 0.5, size = 14, face = "bold")
)
grid.arrange(p1, p2, p3, p4, ncol = 2)
4.2 Temporal Trends
4.2.1 Distribution of Trips Throughout the Week
It will also be interesting to investigate how the demand for Grab trips changes based on time, in particular by day of week. Here, I counted the number of trips for each day of the week and driving mode by using functions from the dplyr package.
We plot the geom_bar() from ggplot2 package here. We can see that weekends have slightly fewer number of trips taken than during the weekdays. Additionally, the choice between motorcycle or car is almost equal, with a marginally higher number of motorcycle trips taken across all days. Out of all days of the week, Thursdays have the highest number of trips taken which coincides with a school and work day, while Sunday which coincides with a rest day showed the least number of trips.
weekly_trips <- trips %>%
group_by(origin_day, driving_mode) %>%
summarise(total_trips = n(), .groups = 'drop') %>%
arrange(match(origin_day, c("monday", "tuesday", "wednesday", "thursday", "friday", "saturday", "sunday")))
weekly_trips$origin_day <- factor(weekly_trips$origin_day,
levels = c("monday", "tuesday", "wednesday", "thursday", "friday", "saturday", "sunday"))
ggplot(weekly_trips, aes(x = origin_day, y = total_trips, fill = driving_mode)) +
geom_bar(stat = "identity", position = "stack") + # Stacked bar chart
labs(title = "Number of Trips Throughout the Week by Driving Mode",
x = "Day of the Week",
y = "Number of Trips") +
scale_fill_manual(values = c("car" = "lightblue", "motorcycle" = "pink2")) + # Custom colors
theme_gray() +
theme(plot.title = element_text(hjust = 0.5, size = 14, face = "bold")) +
geom_text(aes(label = total_trips),
position = position_stack(vjust = 0.5),
color = "black",
size = 4,
show.legend = FALSE) 
4.2.2 Distribution of number of trips throughout the Day
1) By Individual Hours
By using the geom_bar function, we can plot the count of Grab trips taken for each hour of a day. The peak traffic time appears to be at 10am while the quietest period is at 7pm in the evening. We cna also see the the demand for traffic rising from 12am onward which then declines in number of Grab trips past 10am.
hourly_trips <- trips %>%
group_by(origin_hour) %>%
summarise(number_of_trips = n(), .groups = 'drop') %>%
arrange(origin_hour)
hourly_trips$origin_hour <- factor(hourly_trips$origin_hour,
levels = sort(unique(trips$origin_hour)))
ggplot(hourly_trips, aes(x = origin_hour, y = number_of_trips, group = 1)) + # Set group = 1
geom_line(color = "lightblue3", linewidth = 1) +
geom_point(color = "lightblue3", size = 2) +
geom_text(aes(label = number_of_trips), vjust = -0.5, color = "black", size = 3) +
labs(title = "Distribution of Number of Trips Throughout the Day",
x = "Hour of the Day",
y = "Number of Trips") +
theme_gray() +
theme(
plot.title = element_text(hjust = 0.5, size = 14, face = "bold"),
axis.text.x = element_text(angle = 45, hjust = 1) # Rotate x-axis text for readability
) +
theme(legend.position = "none")
2) By Aggregated Hour Category
Additionally, we can also aggregate the hour_category column into morning, afternoon, evening and midnight, including whether they are peak or lull periods, as such. This allows for easier interpretability of the data charts.
trips <- trips %>%
mutate(hour_category = case_when(
origin_hour %in% 1:3 ~ "midnight_peak",
origin_hour %in% 4:6 ~ "midnight_lull",
origin_hour %in% 7:9 ~ "morning_peak",
origin_hour %in% 10:12 ~ "morning_lull",
origin_hour %in% 13:15 ~ "afternoon_peak",
origin_hour %in% 16:18 ~ "afternoon_lull",
origin_hour %in% 19:21 ~ "evening_peak",
origin_hour %in% c(22:24,0) ~ "evening_lull"
)) %>%
mutate(hour_category = factor(hour_category,
levels = c("midnight_peak", "midnight_lull",
"morning_peak", "morning_lull",
"afternoon_peak", "afternoon_lull",
"evening_peak", "evening_lull")))As shown, the morning peak hours of 6am to 9am have the highest number of Grab trips while the evening peak periods of 6pm to 9pm have the least number of trips taken.
hourly_trips <- trips %>%
group_by(hour_category) %>%
summarise(number_of_trips = n(), .groups = 'drop')
hourly_trips$hour_category <- factor(hourly_trips$hour_category,
levels = unique(hourly_trips$hour_category))
ggplot(hourly_trips, aes(x = hour_category, y = number_of_trips, group = 1)) +
geom_line(color = "lightblue3", linewidth = 1) +
geom_point(color = "lightblue3", size = 2) +
geom_text(aes(label = number_of_trips), vjust = -0.5, color = "black", size = 3.5) +
labs(title = "Distribution of Number of Trips By Hour Category",
x = "Hour Category",
y = "Number of Trips") +
theme_gray() +
theme(
plot.title = element_text(hjust = 0.5, size = 14, face = "bold"),
axis.text.x = element_text(angle = 45, hjust = 1)
) +
theme(legend.position = "none")
4.2.3 Distribution of Trip Duration
1) By Time of Day
Similar to the distribution of trip distance, by using geom_violin() from ggplot2 package, we can visualise the spread of trip duration (in minutes) from origin to destination, based on each hour category. Here, we randomly sample 2000 trip data to enable more readability since plotting the entire dataset will result in a chunk of dots that do not offer insights into the spread of trip distance.
Across each hour category, the total duration spent on Grab journeys tend to be similar in their median values. Perhaps, we see the most outliers in trip distances for trips taken during the morning lull and evening lull where extremely long trip distances are observed.
sampled_trips <- trips %>%
sample_n(2000)
ggplot(sampled_trips, aes(x = hour_category, y = total_duration_minutes, fill = hour_category)) +
geom_violin(trim = FALSE, alpha = 0.6, draw_quantiles = c("0.5")) +
geom_jitter(width = 0.2, size = 0.5, alpha = 0.2, color = "black") +
labs(title = "Distribution of Trip Duration (minutes) by Hour Category",
x = "Hour Category",
y = "Total Duration (minutes)") +
theme_gray() +
theme(
plot.title = element_text(hjust = 0.5, size = 14, face = "bold"),
axis.text.x = element_text(angle = 45, hjust = 1) # Tilt x-axis labels
) +
scale_fill_manual(values = c("lightblue", "lightcoral", "green3", "lightpink", "lightyellow", "lightgray", "lightcyan", "lightgoldenrod")) +
theme(legend.position = "none") 
When I remove outliers in total_duration_minutes that fall below Q1 - 1.5 * IQR or above Q3 + 1.5 * IQR, we can once again observe that the median duration taken is the same across all time of day.
# Calculate IQR and filter outliers in total_duration_minutes
filtered_trips <- trips %>%
filter(between(total_duration_minutes,
quantile(total_duration_minutes, 0.25) - 1.5 * IQR(total_duration_minutes),
quantile(total_duration_minutes, 0.75) + 1.5 * IQR(total_duration_minutes))) %>%
sample_n(2000)
# Plot without outliers
ggplot(filtered_trips, aes(x = hour_category, y = total_duration_minutes, fill = hour_category)) +
geom_violin(trim = FALSE, alpha = 0.6, draw_quantiles = c("0.5")) +
geom_jitter(width = 0.2, size = 0.5, alpha = 0.2, color = "black") +
labs(title = "Distribution of Trip Duration (minutes) by Hour Category - No Outliers",
x = "Hour Category",
y = "Total Duration (minutes)") +
theme_gray() +
theme(
plot.title = element_text(hjust = 0.5, size = 14, face = "bold"),
axis.text.x = element_text(angle = 45, hjust = 1) # Tilt x-axis labels
) +
scale_fill_manual(values = c("lightblue", "lightcoral", "green3", "lightpink", "lightyellow", "lightgray", "lightcyan", "lightgoldenrod")) +
theme(legend.position = "none")
2) By Day of Week
In terms of day of the week, Mondays, Tuesdays and Wednesdays tend to have a slightly shorter trip duration than the rest of the days of the week.
# Sample 2000 trips
sampled_trips <- trips %>%
sample_n(2000)
# Reorder origin_day as a factor with levels from Monday to Sunday
sampled_trips$origin_day <- factor(sampled_trips$origin_day,
levels = c("monday", "tuesday", "wednesday", "thursday", "friday", "saturday", "sunday"))
# Create the violin plot
ggplot(sampled_trips, aes(x = origin_day, y = total_duration_minutes, fill = origin_day)) +
geom_violin(trim = FALSE, alpha = 0.6, draw_quantiles = c("0.5")) +
geom_jitter(width = 0.2, size = 0.5, alpha = 0.2, color = "black") +
labs(title = "Distribution of Trip Duration (minutes) by Day of Week",
x = "Day of Week",
y = "Trip Duration (minutes)") +
theme_gray() +
theme(
plot.title = element_text(hjust = 0.5, size = 14, face = "bold"),
axis.text.x = element_text(angle = 45, hjust = 1) # Tilt x-axis labels
) +
scale_fill_manual(values = c("lightblue", "lightcoral", "green3", "lightpink", "lightyellow", "lightgray", "lightcyan")) +
theme(legend.position = "none")
We have a clearer view of the spread of trip duration when we remove all outliers. We can see that trips taken on Wednesday tend to be shorter journeys than other days of the week.
# Remove outliers based on IQR and then sample 2000 trips
filtered_trips <- trips %>%
filter(between(total_duration_minutes,
quantile(total_duration_minutes, 0.25) - 1.5 * IQR(total_duration_minutes),
quantile(total_duration_minutes, 0.75) + 1.5 * IQR(total_duration_minutes))) %>%
sample_n(2000)
# Reorder origin_day as a factor with levels from Monday to Sunday
filtered_trips$origin_day <- factor(filtered_trips$origin_day,
levels = c("monday", "tuesday", "wednesday", "thursday", "friday", "saturday", "sunday"))
# Create the violin plot without outliers
ggplot(filtered_trips, aes(x = origin_day, y = total_duration_minutes, fill = origin_day)) +
geom_violin(trim = FALSE, alpha = 0.6, draw_quantiles = c("0.5")) +
geom_jitter(width = 0.2, size = 0.5, alpha = 0.2, color = "black") +
labs(title = "Distribution of Trip Duration (minutes) by Day of Week - No Outliers",
x = "Day of Week",
y = "Trip Duration (minutes)") +
theme_gray() +
theme(
plot.title = element_text(hjust = 0.5, size = 14, face = "bold"),
axis.text.x = element_text(angle = 45, hjust = 1) # Tilt x-axis labels
) +
scale_fill_manual(values = c("lightblue", "lightcoral", "green3", "lightpink", "lightyellow", "lightgray", "lightcyan")) +
theme(legend.position = "none")
3) By Driving Mode
We can also use the driving_mode column of the trips data to plot the distribution of trip duration. We can see that motorcycles are slighty more preferred for shorter trip duration than cars,though cars are almost equally as demanded. In terms of longer journeys which involve more minutes, taking the car is more preferred as a mode of driving.
ggplot(trips, aes(x = total_duration_minutes, fill = driving_mode)) +
geom_density(alpha = 0.8) +
labs(title = "Trip Duration (minutes) by Driving Mode (All Trips)",
x = "Total Duration (minutes)",
y = "Density") +
theme_gray() +
theme(plot.title = element_text(hjust = 0.5, size = 14, face = "bold")) +
scale_fill_manual(values = c("lightblue", "pink2"))
We can also observe the distribution of trip duration within Jakarta only, meaning we filter out all origin and destinations that are discovered to be outside of Jakarta. Likewise, cars are slightly more preferred than motorcycles for longer trip duration.
ggplot(trips_inside_jakarta, aes(x = total_duration_minutes, fill = driving_mode)) +
geom_density(alpha = 0.8) +
labs(title = "Trip Duration (minutes) by Driving Mode (Within Jakarta)",
x = "Total Duration (minutes)",
y = "Density") +
theme_gray() +
theme(plot.title = element_text(hjust = 0.5, size = 14, face = "bold")) +
scale_fill_manual(values = c("lightblue", "pink2"))
4.2.4 Distribution of Trip Distance
1) By Time of Day
Again, we randomly sample 2000 trip data to plot the violin plot of total_distance_km against hour categories. We can observe that longer distances are traveled during the afternoon lull and evening peak periods on average which could be attributed to high traffic volumes or jams on the road.
sampled_trips <- trips %>%
sample_n(2000)
ggplot(sampled_trips, aes(x = hour_category, y = total_distance_km, fill = hour_category)) +
geom_violin(trim = FALSE, alpha = 0.6, draw_quantiles = "0.5") +
geom_jitter(width = 0.2, size = 0.5, alpha = 0.2, color = "black") +
labs(title = "Distribution of Trip Distance (km) by Hour Category",
x = "Hour Category",
y = "Total Distance (km)") +
theme_gray() +
theme(
plot.title = element_text(hjust = 0.5, size = 14, face = "bold"),
axis.text.x = element_text(angle = 45, hjust = 1) # Tilt x-axis labels
) +
scale_fill_manual(values = c("lightblue", "lightcoral", "green3", "lightpink", "lightyellow", "lightgray", "lightcyan", "lightgoldenrod")) +
theme(legend.position = "none") 
2) By Day of Week
The chart suggests that trips taken on Saturday and Sunday have the highest average distance travelled, meaning that people are more willing to travel to destinations further away from where they stay on these days.
library(ggplot2)
library(dplyr)
# Sample 2000 trips
sampled_trips <- trips %>%
sample_n(2000)
# Reorder origin_day as a factor with levels from Monday to Sunday
sampled_trips$origin_day <- factor(sampled_trips$origin_day,
levels = c("monday", "tuesday", "wednesday", "thursday", "friday", "saturday", "sunday"))
# Create the violin plot
ggplot(sampled_trips, aes(x = origin_day, y = total_distance_km, fill = origin_day)) +
geom_violin(trim = FALSE, alpha = 0.6, draw_quantiles = "0.5") +
geom_jitter(width = 0.2, size = 0.5, alpha = 0.2, color = "black") +
labs(title = "Distribution of Trip Distance (km) by Day of Week",
x = "Day of Week",
y = "Total Distance (km)") +
theme_gray() +
theme(
plot.title = element_text(hjust = 0.5, size = 14, face = "bold"),
axis.text.x = element_text(angle = 45, hjust = 1) # Tilt x-axis labels
) +
scale_fill_manual(values = c("lightblue", "lightcoral", "green3", "lightpink", "lightyellow", "lightgray", "lightcyan")) +
theme(legend.position = "none")
4.2.5 Day of Week and Hourly Patterns in Ride-hailing Demand
Now, let’s generate a heatmap displaying the distribution of trips across different days of the week (using the origin_day column) and the specified hour categories, with the fill color indicating the number of trips. The days are ordered from Monday to Sunday, making it easier to interpret the data.
We can observe the highest number of trips taken during both morning peak and lull periods on Thursdays, as well as, Saturday midnight lull periods. On the other hand, Monday and Tuesday evening peaks have the lowest number of trips.
# Create a new summary data frame using hour_category
heatmap_data <- trips %>%
group_by(origin_day, hour_category) %>%
summarise(number_of_trips = n(), .groups = 'drop') %>%
arrange(origin_day, hour_category)
# Convert origin_day to a factor with specific order
heatmap_data$origin_day <- factor(heatmap_data$origin_day,
levels = c("monday", "tuesday", "wednesday", "thursday",
"friday", "saturday", "sunday"))
# Convert hour_category to a factor for better ordering
heatmap_data$hour_category <- factor(heatmap_data$hour_category,
levels = c("midnight_peak", "midnight_lull",
"morning_peak", "morning_lull",
"afternoon_peak", "afternoon_lull",
"evening_peak", "evening_lull"))
# Create the heatmap
ggplot(heatmap_data, aes(x = hour_category, y = origin_day)) +
geom_tile(aes(fill = number_of_trips), color = "white") +
geom_text(aes(label = number_of_trips), color = "black", size = 4) +
scale_fill_gradient(low = "lightblue", high = "darkblue", name = "Number of Trips") +
labs(title = "Heatmap of Number of Trips by Day and Hour Category",
x = "Hour Category",
y = "Day") +
theme_minimal() +
theme(
plot.title = element_text(hjust = 0.5, size = 16, face = "bold"),
axis.text.x = element_text(angle = 45, hjust = 1)
)
4.3 Impact of Weather
4.3.1 Volume of Trips per District based on Weather Conditions (Rain or No Rain)
We can visualise that the number of trips taken when there is no rain outweights the number of trips for rainy weather conditins rather significantly across all districts. This is good news for Grab since their ride-hailing services are still as demanded for not just during trip origins where rain might be seen as an inconvenience, but also during regular weather conditions where there is no rain.
district_origin_long <- district_origin %>%
select(district, not_rain, rain) %>%
pivot_longer(cols = c(not_rain, rain), names_to = "weather_type", values_to = "trip_count")
# Plot the data
ggplot(district_origin_long, aes(x = district, y = trip_count, fill = weather_type)) +
geom_bar(stat = "identity", position = "stack") +
labs(
title = "Trip Counts by District and Weather Type",
x = "District",
y = "Number of Trips",
fill = "Weather Type"
) +
theme_gray() +
theme(
plot.title = element_text(hjust = 0.5, size = 16, face = "bold"),
axis.text.x = element_text(angle = 90, hjust = 1)
) +
scale_fill_manual(values = c("not_rain" = "skyblue", "rain" = "steelblue"))
4.3.2 Volume of Trips based on Specific Weather Sub-Categories
The most number of trips were taken when the weather condition for the origin was ‘broken clouds’, ‘scattered clouds’ and ‘overcast clouds’ which is Jakarta’s most common weather condition throughout the year, with occasional wet weathers. This indicates that Grab services are as demanded even when there is no rain. In fact, we can notice that origin trips with light rain has more demand than trips found having heavy rain.
# Summing trip counts by weather condition
weather_summary <- district_origin %>%
summarise(across(starts_with("broken clouds"):starts_with("heavy rain"), sum)) %>%
pivot_longer(cols = everything(), names_to = "weather_condition", values_to = "trip_count")
# Bar chart
ggplot(weather_summary, aes(x = reorder(weather_condition, -trip_count), y = trip_count)) +
geom_bar(stat = "identity", fill = "lightblue3") +
geom_text(aes(label = trip_count), vjust = -0.5, size = 4) +
labs(
title = "Number of Trips by Weather Condition",
x = "Weather Condition",
y = "Number of Trips"
) +
theme_gray() +
theme(
plot.title = element_text(hjust = 0.5, size = 16, face = "bold"),
axis.text.x = element_text(angle = 45, hjust = 1)
)
4.3.3 Impact of Trip Duration on Trip Volume by Weather Conditions (Rain or No Rain)
The analysis reveals that trips with shorter durations (measured in minutes) tend to have higher demand, with a greater number of trips recorded for shorter journeys compared to longer ones. This trend persists across different weather conditions, including both rainy and non-rainy weather.
This pattern suggests that weather conditions—whether rainy or no rain—do not significantly impact commuter behavior in relation to trip length. In other words, commuters are inclined to travel shorter distances more frequently, regardless of weather, while longer trips are less frequent under all weather conditions.
district_long <- district_origin %>%
pivot_longer(
cols = c("rain","not_rain"),
names_to = "weather_condition",
values_to = "trip_count"
)
ggplot(district_long, aes(x = avg_duration_minutes, y = trip_count, color = weather_condition)) +
geom_point(alpha = 0.7) +
geom_smooth(method="lm", se = TRUE) +
facet_wrap(~ weather_condition, scales = "free_y") +
scale_color_viridis_d(option = "plasma") +
labs(
title = "Relationship Between Average Trip Duration and Trip Count by Weather Condition",
x = "Average Trip Duration (minutes)",
y = "Number of Trips"
) +
theme_gray() +
theme(
plot.title = element_text(hjust = 0.5, size = 16, face = "bold"),
legend.position = "none",
axis.text.x = element_text(angle = 45, hjust = 1)
)`geom_smooth()` using formula = 'y ~ x'

4.3.4 Impact of Trip Duration on Trip Volume by Weather Sub-Categories
Next, I will further drill down the rain and not_rain weather categories by its sub-categories. Interestingly, trip origins that are under heavy rain conditions tend to somewhat accumulate more trips especially for journeys that are longer in duration (minutes). This means that commuters likely expect for the heavy rain to last for a long time and would hence, result in a greater demand for Grab services when there is heavy rain. In contrast, start of trips with light rain shows the reverse trend where less trips are demanded for when duration of trip is longer.
district_long <- district_origin %>%
pivot_longer(
cols = c("broken clouds", "scattered clouds", "light rain", "overcast clouds",
"moderate rain", "haze", "fog", "few clouds", "heavy rain"),
names_to = "weather_condition",
values_to = "trip_count"
)
ggplot(district_long, aes(x = avg_duration_minutes, y = trip_count, color = weather_condition)) +
geom_point(alpha = 0.7) +
geom_smooth(method="lm", se = TRUE) +
facet_wrap(~ weather_condition, scales = "free_y") +
scale_color_viridis_d(option = "plasma") +
labs(
title = "Relationship Between Average Trip Duration and Trip Count by Weather Condition",
x = "Average Trip Duration (minutes)",
y = "Number of Trips"
) +
theme_gray() +
theme(
plot.title = element_text(hjust = 0.5, size = 16, face = "bold"),
legend.position = "none",
axis.text.x = element_text(angle = 45, hjust = 1)
)`geom_smooth()` using formula = 'y ~ x'

4.3.2 Impact of Weather on Trip Demand By Motorcycles and Cars
Prepare data by aggregating district, weather & driving mode
trips_origin_weather_vehicle <- trips %>%
filter(origin_weather_description %in% weather_descriptions) %>%
group_by(origin_district, origin_weather_description, driving_mode) %>%
summarise(num_of_trips = n(), .groups = 'drop') %>%
# Create a new column for each combination of weather and driving mode
pivot_wider(
names_from = c(origin_weather_description, driving_mode),
values_from = num_of_trips,
values_fill = list(num_of_trips = 0)
) %>%
rename(district = origin_district) %>%
mutate(district = tolower(district))
# The rest remains the same
category_counts <- trips %>%
group_by(origin_district, origin_weather_description_category) %>%
summarise(total_category_count = n(), .groups = 'drop') %>%
pivot_wider(
names_from = origin_weather_description_category,
values_from = total_category_count,
values_fill = list(total_category_count = 0)
) %>%
rename(district = origin_district) %>%
mutate(district = tolower(district))
# Join the dataframes
trips_origin_weather_vehicle <- trips_origin_weather_vehicle %>%
left_join(category_counts, by = "district")
# Prepare data by gathering columns into long format
plot_data <- trips_origin_weather_vehicle %>%
select(district, contains("car"), contains("motorcycle")) %>%
pivot_longer(
cols = -district,
names_to = c("weather", "driving_mode"),
names_sep = "_",
values_to = "num_of_trips"
) %>%
filter(driving_mode %in% c("car", "motorcycle")) %>%
group_by(weather, driving_mode) %>%
summarise(total_trips = sum(num_of_trips, na.rm = TRUE), .groups = 'drop')
head(plot_data)# A tibble: 6 × 3
weather driving_mode total_trips
<chr> <chr> <int>
1 broken clouds car 10433
2 broken clouds motorcycle 8925
3 few clouds car 6
4 few clouds motorcycle 8
5 fog car 74
6 fog motorcycle 131
The demand for cars seem to be higher than motorcycles during trip origins with light rain and overcast clouds. This could be attributed to safety concerns of using motorcycles on slippery roads. Ironically, the trips taken during periods of heavy rain seem to be dominated by motorcycles than cars. Therefore, we cannot make much conclusions of how the weather impacts the mode of driving, but we can see that weather does minimal impact on commuter’s decision of their preferred vehicle.
ggplot(plot_data, aes(x = weather, y = total_trips, fill = driving_mode)) +
geom_bar(stat = "identity", position = "dodge") +
geom_text(
aes(label = total_trips),
position = position_dodge(width = 0.9),
vjust = -0.5, color = "black", size = 3.5
) +
scale_fill_manual(values = c("car" = "skyblue", "motorcycle" = "salmon")) +
labs(
title = "Comparison of Car and Motorcycle Trips Across Different Weather Conditions",
x = "Weather Condition",
y = "Total Number of Trips",
fill = "Driving Mode"
) +
theme_gray() +
theme(
axis.text.x = element_text(angle = 45, hjust = 1),
plot.title = element_text(hjust = 0.5, size = 16, face = "bold"),
legend.position = "top"
)
4.4 Point of Interest (POI) Analysis
4.4.1 Distribution of POIs Across Districts
We can visualise the distribution of POIs across each district using the district_dest augmented dataframe we created previously. With that said, we can observe that the top three highest number POIs are found in the districts Kebayoran Baru, Setia Budi and Grogol Petamburan.
# Plot the distribution of POIs across districts
ggplot(district_dest, aes(x = reorder(district, num_of_pois), y = num_of_pois)) +
geom_bar(stat = "identity", fill = "lightblue3") +
labs(title = "Distribution of POIs Across Districts",
x = "District",
y = "Number of POIs") +
theme_gray() +
theme(
plot.title = element_text(hjust = 0.5, size = 14, face = "bold"),
axis.text.x = element_text(angle = 45, hjust = 1)
)
Next, let us use the pois_num dataframe created earlier. I will plot a heatmap plot using the geom_tile() function of the ggplot2 package to understand how the number of trips demanded vary by district and POI category. We can evidently see that districts like Danau Sunter have low to no POIs, with only essentials (e.g. clinics, veteranarians) found in the district. However, in districts that may be more populated or urbanised, we see a higher number of POIs for the office businesses, restaurants and food, and shops as marked by the darker shades of blue below.
# Count the number of POIs by district
pois_num_cat <- pois %>%
st_drop_geometry() %>%
group_by(district, category) %>%
summarise(num_of_pois = n(), .groups = 'drop')
heatmap_plot <- ggplot(pois_num_cat, aes(x = category, y = district, fill = num_of_pois)) +
geom_tile(color = "white") +
scale_fill_gradient(low = "lightblue", high = "darkblue") +
labs(
title = "Heatmap of Points of Interest by District and Category",
x = "POI Category",
y = "District",
fill = "Number of POIs"
) +
theme_minimal() + # Clean theme
theme(
axis.text.x = element_text(angle = 45, hjust = 1),
plot.title = element_text(hjust = 0.5, size = 16, face = "bold")
)
print(heatmap_plot)
4.4.2 Demand of Ride-hailing Services by POI
To prepare the dataset for the analysis involving POIs, I will augment the jakarta_pois.rds dataset by aggregating the trips data from the trips simple dataframe. Restaurants, office businesses and shops seem to dominate the POIs available in Jakarta, while cultural attractions and recreation entertainment suggests to be less available.
category_counts <- pois %>%
group_by(category) %>%
summarise(num_of_pois = n()) %>%
arrange(desc(num_of_pois))
ggplot(category_counts, aes(x = reorder(category, -num_of_pois), y = num_of_pois, fill = num_of_pois)) +
geom_bar(stat = "identity") +
geom_text(aes(label = num_of_pois), vjust = -0.5, size = 4) +
labs(
title = "Distribution of POIs Across Jakarta",
x = "Category",
y = "Number of POIs"
) +
theme_gray() +
theme(
plot.title = element_text(hjust = 0.5, size = 16, face = "bold"),
axis.text.x = element_text(angle = 45, hjust = 1, color = "grey30"),
legend.position = "none"
) +
scale_fill_gradient(low = "lightblue", high = "darkblue") 
We can also plot the number of POIs discovered across each POI category type by using a radar chart. Here, I am employing the radarchart() function of the fmsb package.
Code
category_counts <- pois %>%
group_by(category) %>%
summarise(num_of_pois = n()) %>%
arrange(desc(num_of_pois))
max_value <- max(category_counts$num_of_pois)
min_value <- 0
radar_data <- rbind(
rep(max_value, nrow(category_counts)),
rep(min_value, nrow(category_counts)),
t(category_counts$num_of_pois)
)
radar_data <- as.data.frame(radar_data)
colnames(radar_data) <- category_counts$category
radarchart(
radar_data,
axistype = 1,
pcol = rgb(0.2, 0.4, 0.4, 0.6), # Line color
pfcol = rgb(0.2, 0.2, 0.2, 0.2), # Fill color
plwd = 2, # Line width
cglcol = "lightblue3", # Grid line color
cglty = 1, # Grid line type
axislabcol = "black", # Axis label color
caxislabels = seq(0, max_value, length.out = 5), # Axis labels
cglwd = 1.2, # Grid line width
vlcex = 1 # Category label size
)
title("Distribution of POIs Across Jakarta", cex.main = 1.4)
Next, I will go deeper into finding out how many Grab trips were taken to these destinations at the district level. It can be observed that a larger number of POIs (pink) within a district lead to similarly higher number of Grab trips taken (blue) to the district.
# Reshape the data for plotting
district_long <- district_dest %>%
pivot_longer(cols = c(num_of_trips, num_of_pois),
names_to = "metric",
values_to = "count")
# Plot the line charts
ggplot(district_long, aes(x = district, y = count, group = metric, color = metric)) +
geom_line(linewidth = 1) +
geom_point(size = 2) +
labs(
title = "Trends in Number of Trips and Availability of POIs by District",
x = "District",
y = "Number of Trips",
color = "Metric"
) +
scale_color_manual(values = c("num_of_trips" = "lightblue3", "num_of_pois" = "pink3")) +
theme_gray() +
theme(
axis.text.x = element_text(angle = 45, hjust = 1),
plot.title = element_text(size = 16, hjust = 0.5, face = "bold")
)
The scatterplot below indicates that more Grab trips are taken for smaller average distances travelled from the passenger’s origin, suggesting most trips are bounded within shorter distances. This is surprising since I would have assumed a higher demand for Grab services for when the destination is far from the passenger’s origin.
In addition, we can also observe that destinations with a higher number of POI (darker shades of blue) tend to result in more Grab trips taken. This means that passengers are more likely to take a trip to districts in Jakarta that offer more POIs.
district_pois_5 <- district_dest %>%
mutate(pois_category = cut(num_of_pois,
breaks = 5,
labels = c("100", "200", "300", "400", "500"),
include.lowest = TRUE))
color_palette <- c("100" = "#A3C6E4", # Light blue
"200" = "#76A4D6", # Medium light blue
"300" = "#4A83C6", # Medium blue
"400" = "#1F5DA0", # Dark blue
"500" = "#003C71") # Darker blue
ggplot(district_pois_5, aes(x = num_of_trips, y = avg_distance_km, size = num_of_pois, color = pois_category)) +
geom_point(alpha = 0.7) +
labs(
title = "Correlation of Number of Trips and Average Distance by Number of POIs",
x = "Number of Trips",
y = "Average Distance (km)",
size = "Number of POIs",
color = "Number of POIs" # Renamed here
) +
scale_size(range = c(1, 6)) + # Adjust point size range
scale_color_manual(values = color_palette) + # Apply custom color palette
theme_gray() +
theme(
plot.title = element_text(size = 16, hjust = 0.5, face = "bold"),
legend.position = "right"
)
4.4.3 Correlation of Number of Trips and POIs
We can observe a positive correlation between the number of trips taken to a destination and the availability of POIs in the destination district based on the linear regression model below. This means that more trips are demanded to the destination when there are more POIs available, typically these districts might be within the city with more amenities for people to eat, shop and work.
# Load required libraries
library(dplyr)
library(ggplot2)
# Calculate the correlation coefficient
correlation_result <- cor(district_dest$num_of_trips, district_dest$num_of_pois)
# Create a scatter plot with a regression line
ggplot(district_dest, aes(x = num_of_pois, y = num_of_trips)) +
geom_point(color = "blue", size = 3, alpha = 0.6) + # Points
geom_smooth(method = "lm", se = TRUE, color = "red") + # Linear regression line
labs(
title = "Correlation between Number of Trips and Number of POIs",
x = "Number of POIs",
y = "Number of Trips"
) +
theme_gray() +
theme(
plot.title = element_text(hjust = 0.5, size = 16, face = "bold")
)`geom_smooth()` using formula = 'y ~ x'

# Print the correlation result
print(paste("Correlation between Number of Trips and Number of POIs:", round(correlation_result, 2)))[1] "Correlation between Number of Trips and Number of POIs: 0.8"
By including the average distance traveled, we can see that fewer passengers tend to take ride-hailing services like Grab for when the average distance of trips are shorter. For district destinations with higher number of POIs available, trip distances tend to be shorter too, which could mean that passengers likely reside near to these POIs.
district_dist_6 <- district_dest %>%
mutate(distance_category = cut(avg_distance_km,
breaks = 6,
labels = c("5.0", "5.5", "6.0", "6.5", "7.0", "7.5"),
include.lowest = TRUE))
distance_color_palette <- c("5.0" = "#A3C6E4", # Light blue
"5.5" = "#76A4D6", # Medium light blue
"6.0" = "#4A83C6", # Medium blue
"6.5" = "#1F5DA0", # Dark blue
"7.0" = "#003C71", # Darker blue
"7.5" = "midnightblue") # Deepest blue
district_dist_6 <- district_dist_6 %>%
mutate(distance_category = factor(distance_category))
ggplot(district_dist_6, aes(x = num_of_trips, y = num_of_pois)) +
geom_point(aes(color = distance_category, size = distance_category), alpha = 0.7) +
labs(
title = "Relationship between Number of Trips and Number of POIs by Average Distance",
x = "Number of Trips",
y = "Number of POIs",
size = "Average Distance Category (km)",
color = "Average Distance Category (km)"
) +
scale_size_manual(values = c(1, 2, 3, 4, 5, 6)) +
scale_color_manual(values = distance_color_palette) +
theme_gray() +
theme(
plot.title = element_text(size = 16, hjust = 0.5, face = "bold"),
axis.title = element_text(size = 12),
legend.position = "right"
)
4.4.3 Traffic Volumes By Time of Day and POI Type
Next, i will want to understand traffic volumes by Hour Category and POI Category. Before that, I will create a new dataframe district_hour_pois to aggregate the 8 hour categories into 4 instead, namely morning, afternoon, evening and midnight.
Code
# Define time-of-day categories based on hour_category labels
trips_num <- trips %>%
group_by(destination_district, hour_category) %>%
summarise(num_of_trips = n(), .groups = 'drop') %>%
rename(district = destination_district) %>%
mutate(
district = tolower(district),
# Group hour_category into broader time-of-day groups
time_of_day = case_when(
hour_category %in% c("morning_peak", "morning_lull") ~ "morning",
hour_category %in% c("afternoon_peak", "afternoon_lull") ~ "afternoon",
hour_category %in% c("evening_peak", "evening_lull") ~ "evening",
hour_category %in% c("midnight_peak", "midnight_lull") ~ "midnight",
TRUE ~ "other"
),
# Set specific ordering for time_of_day
time_of_day = factor(time_of_day, levels = c("morning", "afternoon", "evening", "midnight"))
) %>%
# Group by district and new time_of_day categories, summing num_of_trips
group_by(district, time_of_day) %>%
summarise(num_of_trips = sum(num_of_trips), .groups = 'drop')
# Count the number of POIs by district & category
pois_num_cat <- pois %>%
st_drop_geometry() %>%
group_by(district, category) %>%
summarise(num_of_pois = n(), .groups = 'drop')
# Combine the two datasets by district
district_hour_pois <- trips_num %>%
inner_join(pois_num_cat, by = "district", relationship = "many-to-many") %>%
arrange(district, time_of_day)
# View the combined dataset
head(district_hour_pois)# A tibble: 6 × 5
district time_of_day num_of_trips category num_of_pois
<chr> <fct> <int> <chr> <int>
1 cakung morning 357 Cultural_Attractions 1
2 cakung morning 357 Essentials 12
3 cakung morning 357 Facilities_Services 16
4 cakung morning 357 Offices_Business 18
5 cakung morning 357 Recreation_Entertainment 2
6 cakung morning 357 Restaurants_Food 3
Across all POI types, districts offering restaurants and shops tend to garner the highest demand for Grab services. In constrast, recreation entertainment showed to be least demanded by Grab users.
In the morning, Grab services are most demanded to destinations that offer facilities services, shops and essentials, where commuters are likely headed to work or settle household errands. This trend can also be observed in the afternoon.
When it comes to evening time, the number of trips taken to restaurants are seen to rise likely since commuters are done with the day and are headed for dinner. It is also surprising that Grab transport are also highly active during midnight (12am - 6pm), namely where destinations are facilities services, shops and restaurants. This could be due to commuters ending work late or heading out for work in the early morning.
aggregated_pois <- district_hour_pois %>%
group_by(category, time_of_day) %>%
summarise(num_of_trips = sum(num_of_trips, na.rm = TRUE), .groups = "drop")
ggplot(aggregated_pois, aes(x = category, y = num_of_trips, fill = time_of_day)) +
geom_bar(stat = "identity") +
geom_text(aes(label = num_of_trips),
position = position_stack(vjust = 0.5),
color = "black",
size = 3.5) +
labs(
title = "Number of POIs Visited Using Grab by Time of Day",
x = "POI Category",
y = "Number of POIs"
) +
theme_gray() +
theme(
plot.title = element_text(hjust = 0.5, size = 16, face = "bold"),
axis.text.x = element_text(angle = 45, hjust = 1)
) +
scale_fill_brewer(palette = "Blues")
4.5 Origin-Destination Spatial Flows
Next, we can also delve into the spatial flows between origin and destination districts. To do so, I will leverage the geom_alluviam() function of the ggplot2 and ggalluvial packages to map the flows. I will also ientify the top 5 origin / destination districts by doing a filter() on the data as shown.
4.5.1 Flow from All Origins to Top 5 Destinations
We can see how complex the spread of origin districts are, where a thicker line connecting the origin and destination resembles a higher demand for Grab. In this case, trips taken outside of Jakarta and moving into the district of Kebayoran Lama are the highest. The reverse is also true; Grab trips moving from Kebayoran Lama to outside of Jakarta are the highest.
# Identify the top 5 destination districts
top_5_destinations <- trips %>%
count(destination_district) %>%
top_n(5, wt = n) %>%
pull(destination_district)
# Filter the data to include only top 5 destinations
flow_data <- trips %>%
filter(destination_district %in% top_5_destinations) %>%
count(origin_district, destination_district) %>%
rename(count = n)
# Plot the alluvial diagram
ggplot(flow_data, aes(axis1 = origin_district, axis2 = destination_district, y = count)) +
geom_alluvium(aes(fill = destination_district)) +
geom_stratum() +
geom_text(stat = "stratum", aes(label = after_stat(stratum))) +
labs(title = "Flow from Origin to Top 5 Destination Districts",
x = "Districts",
y = "Number of Trips") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5, size = 14, face = "bold"))
4.5.2 Flow Pattern from Top 5 Origins to All Destinations
As for the trips moving out from the top 5 origin districts, we see that most trips come from outside of Jakarta, meaning that a substantial flow of Grab traffic comes from outbound of Jakarta. As compared to the previous flow chart, the destinations of these 5 origins seem to spread evenly across all districts of Jakarta. Perhaps, the most popular destination spots for Grab trips are found in Kebayoran Lama, Setia Budi and Tanah Abang.
# Identify the top 5 origin districts based on the number of trips
top_5_origins <- trips %>%
count(origin_district) %>%
top_n(5, wt = n) %>%
pull(origin_district)
# Filter the data to include only top 5 origins and their corresponding destination districts
flow_data <- trips %>%
filter(origin_district %in% top_5_origins) %>%
count(origin_district, destination_district) %>%
rename(count = n)
# Plot the alluvial diagram
ggplot(flow_data, aes(axis1 = origin_district, axis2 = destination_district, y = count)) +
geom_alluvium(aes(fill = origin_district)) + # Use origin_district for fill
geom_stratum() +
geom_text(stat = "stratum", aes(label = after_stat(stratum)), size = 4) +
labs(title = "Flow from Top 5 Origin Districts to Destination Districts",
x = "Districts",
y = "Number of Trips") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5, size = 14, face = "bold"),
axis.text.x = element_text(angle = 45, hjust = 1))
4.5.3 Flow Patterns from Top 5 Origins to Top 5 Destinations
I believe we can make the most meaningful analysis when just looking at the top 5 originating and top 5 destination districts where these districts had the highest number of Grab trips taken. We can observe that Setia Budi is a popular origin and destination for Grab services, in fact, many of the trips are found within the district itself.
# Step 1: Identify the top 5 origin districts
top_5_origins <- trips %>%
count(origin_district) %>%
top_n(5, wt = n) %>%
pull(origin_district)
# Step 2: Identify the top 5 destination districts
top_5_destinations <- trips %>%
count(destination_district) %>%
top_n(5, wt = n) %>%
pull(destination_district)
# Step 3: Filter the trips data for top districts only
flow_data <- trips %>%
filter(origin_district %in% top_5_origins & destination_district %in% top_5_destinations) %>%
count(origin_district, destination_district) %>%
rename(count = n)
# Step 4: Create a new column to distinguish origins and destinations
flow_data <- flow_data %>%
mutate(district_type = ifelse(origin_district %in% top_5_origins, "Origin", "Destination"))
# Step 5: Plot the alluvial diagram
ggplot(flow_data, aes(axis1 = origin_district, axis2 = destination_district, y = count)) +
geom_alluvium(aes(fill = origin_district)) +
geom_stratum() +
geom_text(stat = "stratum", aes(label = after_stat(stratum)), size = 4, color = "black") +
labs(title = "Flow of Trips Between Top 5 Origin and Top 5 Destination Districts in Jakarta",
x = "Districts",
y = "Number of Trips") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5, size = 14, face = "bold"),
axis.text.x = element_text(angle = 45, hjust = 1))
5. Exploratory Spatial Data Analysis - Using Local Indicators of Spatial Association (LISA)
Now, I would also like to perform local spatial autocorrelation to identify specific areas of clustering of number of Grab trips at a local district level using LISA methods (Anselin, 1995).
To conduct LISA analysis, I will be using Local Moran’s Ii which is an extension of Global Moran’s I, designed to identify local clusters and spatial outliers within a dataset. Local Moran’s Ii provides a measure of autocorrelation at individual locations, identifying where significant clustering or outliers exist.
I will be conducting these LISA analysis on the
Overall Number of Trips
Number of Trips by Driving Mode (Car/Motorcycle)
Number of Trips by Weather (Rain/No Rain)
Number of POIs
Number of Trips per POI
Number of Trips per Capita (i.e. population size)
5.1 Computing Contiguity Neighbours
Firstly, let’s create a district_geom simple feature dataframe containing the polygon geometry data of each unique district of Jakarta.
district_geom <- jakarta_district %>%
select(district, geometry)
# Inspect
district_geomSimple feature collection with 44 features and 1 field
Geometry type: GEOMETRY
Dimension: XY
Bounding box: xmin: 13472330 ymin: -3037137 xmax: 13602610 ymax: -2881872
Projected CRS: UCS-2000 / Ukraine TM zone 10
First 10 features:
district geometry
1 cakung POLYGON ((13577740 -2957434...
2 cempaka putih POLYGON ((13546309 -2954155...
3 cengkareng POLYGON ((13501268 -2903641...
4 cilandak POLYGON ((13493440 -2969564...
5 cilincing POLYGON ((13600325 -2936635...
6 cipayung POLYGON ((13537176 -3005377...
7 ciracas POLYGON ((13521824 -2996302...
8 danau sunter POLYGON ((13546437 -2939299...
9 danau sunter dll POLYGON ((13541670 -2937824...
10 duren sawit POLYGON ((13548180 -2972327...
Next, we append the geometry column to the existing district_dest dataframe
district_origin <- district_origin %>%
left_join(district_geom, by = 'district') %>%
st_as_sf()
# Inspect
district_originSimple feature collection with 44 features and 17 fields
Geometry type: GEOMETRY
Dimension: XY
Bounding box: xmin: 13472330 ymin: -3037137 xmax: 13602610 ymax: -2881872
Projected CRS: UCS-2000 / Ukraine TM zone 10
# A tibble: 44 × 18
district num_of_trips avg_duration_minutes avg_distance_km num_of_pois
<chr> <int> <dbl> <dbl> <int>
1 cakung 628 22.2 6.56 56
2 cempaka putih 460 17.9 5.52 38
3 cengkareng 928 20.2 6.00 298
4 cilandak 830 20.4 6.09 242
5 cilincing 266 24.7 7.30 119
6 cipayung 481 22.9 7.09 56
7 ciracas 607 23.1 7.38 126
8 danau sunter 4 19.4 8.55 1
9 danau sunter d… 34 21.3 7.54 3
10 duren sawit 891 21.3 6.39 173
# ℹ 34 more rows
# ℹ 13 more variables: population_count <dbl>, `broken clouds` <int>,
# fog <int>, haze <int>, `light rain` <int>, `moderate rain` <int>,
# `scattered clouds` <int>, `overcast clouds` <int>, `few clouds` <int>,
# `heavy rain` <int>, not_rain <int>, rain <int>, geometry <POLYGON [m]>
district_dest <- district_dest %>%
left_join(district_geom, by = 'district') %>%
st_as_sf()
# Inspect
district_destSimple feature collection with 44 features and 6 fields
Geometry type: GEOMETRY
Dimension: XY
Bounding box: xmin: 13472330 ymin: -3037137 xmax: 13602610 ymax: -2881872
Projected CRS: UCS-2000 / Ukraine TM zone 10
# A tibble: 44 × 7
district num_of_trips avg_duration_minutes avg_distance_km num_of_pois
<chr> <int> <dbl> <dbl> <int>
1 cakung 844 21.5 6.40 56
2 cempaka putih 392 19.1 6.13 38
3 cengkareng 1120 19.9 6.20 298
4 cilandak 905 20.1 5.78 242
5 cilincing 277 24.4 7.24 119
6 cipayung 552 22.5 7.08 56
7 ciracas 546 22.6 7.03 126
8 danau sunter 17 17.3 5.21 1
9 danau sunter d… 33 16.9 5.00 3
10 duren sawit 877 21.1 6.32 173
# ℹ 34 more rows
# ℹ 2 more variables: population_count <dbl>, geometry <POLYGON [m]>
We can now derive our neighbour list object by utilising the st_contiguity() function from the sfdep package to create contiguity weight matrices for the study area. Let us configure different queen parameters that I will eventually like users of the ShinyApp to play with.
queen = TRUE: refers to computing the neighbour list by queen contiguity (neighbours share a common edge and or corner)
queen = FALSE: refers to computing the neighbour list by rook contiguity (similar to bishop, where neighbours share a common edge only)
Using district_origin sf dataframe and queen = TRUE
origin_nb_queen <- st_contiguity(district_origin$geometry, queen=TRUE)
summary(origin_nb_queen)Neighbour list object:
Number of regions: 44
Number of nonzero links: 216
Percentage nonzero weights: 11.15702
Average number of links: 4.909091
Link number distribution:
2 3 4 5 6 7 8 9
1 11 4 12 11 2 2 1
1 least connected region:
16 with 2 links
1 most connected region:
21 with 9 links
Using district_destsf dataframe and queen = TRUE
dest_nb_queen <- st_contiguity(district_dest$geometry, queen=TRUE)
summary(dest_nb_queen)Neighbour list object:
Number of regions: 44
Number of nonzero links: 216
Percentage nonzero weights: 11.15702
Average number of links: 4.909091
Link number distribution:
2 3 4 5 6 7 8 9
1 11 4 12 11 2 2 1
1 least connected region:
16 with 2 links
1 most connected region:
21 with 9 links
Using district_origin sf dataframe and queen = FALSE
origin_nb_rook <- st_contiguity(district_origin$geometry, queen=FALSE)
summary(origin_nb_rook)Neighbour list object:
Number of regions: 44
Number of nonzero links: 210
Percentage nonzero weights: 10.84711
Average number of links: 4.772727
Link number distribution:
2 3 4 5 6 7 8 9
2 10 6 11 11 2 1 1
2 least connected regions:
6 16 with 2 links
1 most connected region:
21 with 9 links
Using district_dest sf dataframe and queen = FALSE
dest_nb_rook <- st_contiguity(district_dest$geometry, queen=FALSE)
summary(dest_nb_rook)Neighbour list object:
Number of regions: 44
Number of nonzero links: 210
Percentage nonzero weights: 10.84711
Average number of links: 4.772727
Link number distribution:
2 3 4 5 6 7 8 9
2 10 6 11 11 2 1 1
2 least connected regions:
6 16 with 2 links
1 most connected region:
21 with 9 links
5.2 Computing Row-Standardised Weight Matrix
origin_wt_queen <- st_weights(origin_nb_queen, style = "W", allow_zero = TRUE)We will mutate the newly created neighbour list object origin_nb_queen and weight matrix origin_wt_queen into our existing district_origin. This results in a newly created object called origin_queen_wm.
origin_queen_wm <- district_origin %>%
mutate(nb = origin_nb_queen,
wt = origin_wt_queen,
.before = 1)
# Inspect
origin_queen_wmSimple feature collection with 44 features and 19 fields
Geometry type: GEOMETRY
Dimension: XY
Bounding box: xmin: 13472330 ymin: -3037137 xmax: 13602610 ymax: -2881872
Projected CRS: UCS-2000 / Ukraine TM zone 10
# A tibble: 44 × 20
nb wt district num_of_trips avg_duration_minutes avg_distance_km
* <nb> <list> <chr> <int> <dbl> <dbl>
1 <int [4]> <dbl> cakung 628 22.2 6.56
2 <int [5]> <dbl> cempaka p… 460 17.9 5.52
3 <int [5]> <dbl> cengkareng 928 20.2 6.00
4 <int [5]> <dbl> cilandak 830 20.4 6.09
5 <int [3]> <dbl> cilincing 266 24.7 7.30
6 <int [3]> <dbl> cipayung 481 22.9 7.09
7 <int [3]> <dbl> ciracas 607 23.1 7.38
8 <int [3]> <dbl> danau sun… 4 19.4 8.55
9 <int [3]> <dbl> danau sun… 34 21.3 7.54
10 <int [3]> <dbl> duren saw… 891 21.3 6.39
# ℹ 34 more rows
# ℹ 14 more variables: num_of_pois <int>, population_count <dbl>,
# `broken clouds` <int>, fog <int>, haze <int>, `light rain` <int>,
# `moderate rain` <int>, `scattered clouds` <int>, `overcast clouds` <int>,
# `few clouds` <int>, `heavy rain` <int>, not_rain <int>, rain <int>,
# geometry <POLYGON [m]>
dest_wt_queen <- st_weights(dest_nb_queen, style = "W", allow_zero = TRUE)We will mutate the newly created neighbour list object dest_nb_queen and weight matrix dest_wt_queen into our existing district_dest. This results in a newly created object called dest_queen_wm.
dest_queen_wm <- district_dest %>%
mutate(nb = dest_nb_queen,
wt = dest_wt_queen,
.before = 1)
# Inspect
dest_queen_wmSimple feature collection with 44 features and 8 fields
Geometry type: GEOMETRY
Dimension: XY
Bounding box: xmin: 13472330 ymin: -3037137 xmax: 13602610 ymax: -2881872
Projected CRS: UCS-2000 / Ukraine TM zone 10
# A tibble: 44 × 9
nb wt district num_of_trips avg_duration_minutes avg_distance_km
* <nb> <list> <chr> <int> <dbl> <dbl>
1 <int [4]> <dbl> cakung 844 21.5 6.40
2 <int [5]> <dbl> cempaka p… 392 19.1 6.13
3 <int [5]> <dbl> cengkareng 1120 19.9 6.20
4 <int [5]> <dbl> cilandak 905 20.1 5.78
5 <int [3]> <dbl> cilincing 277 24.4 7.24
6 <int [3]> <dbl> cipayung 552 22.5 7.08
7 <int [3]> <dbl> ciracas 546 22.6 7.03
8 <int [3]> <dbl> danau sun… 17 17.3 5.21
9 <int [3]> <dbl> danau sun… 33 16.9 5.00
10 <int [3]> <dbl> duren saw… 877 21.1 6.32
# ℹ 34 more rows
# ℹ 3 more variables: num_of_pois <int>, population_count <dbl>,
# geometry <POLYGON [m]>
origin_wt_rook <- st_weights(origin_nb_rook, style = "W", allow_zero = TRUE)We will mutate the newly created neighbour list object origin_nb_rook and weight matrix origin_wt_rook into our existing district_origin. This results in a newly created object called origin_rook_wm.
origin_rook_wm <- district_origin %>%
mutate(nb = origin_nb_rook,
wt = origin_wt_rook,
.before = 1)
# Inspect
origin_rook_wmSimple feature collection with 44 features and 19 fields
Geometry type: GEOMETRY
Dimension: XY
Bounding box: xmin: 13472330 ymin: -3037137 xmax: 13602610 ymax: -2881872
Projected CRS: UCS-2000 / Ukraine TM zone 10
# A tibble: 44 × 20
nb wt district num_of_trips avg_duration_minutes avg_distance_km
* <nb> <list> <chr> <int> <dbl> <dbl>
1 <int [4]> <dbl> cakung 628 22.2 6.56
2 <int [5]> <dbl> cempaka p… 460 17.9 5.52
3 <int [5]> <dbl> cengkareng 928 20.2 6.00
4 <int [5]> <dbl> cilandak 830 20.4 6.09
5 <int [3]> <dbl> cilincing 266 24.7 7.30
6 <int [2]> <dbl> cipayung 481 22.9 7.09
7 <int [3]> <dbl> ciracas 607 23.1 7.38
8 <int [3]> <dbl> danau sun… 4 19.4 8.55
9 <int [3]> <dbl> danau sun… 34 21.3 7.54
10 <int [3]> <dbl> duren saw… 891 21.3 6.39
# ℹ 34 more rows
# ℹ 14 more variables: num_of_pois <int>, population_count <dbl>,
# `broken clouds` <int>, fog <int>, haze <int>, `light rain` <int>,
# `moderate rain` <int>, `scattered clouds` <int>, `overcast clouds` <int>,
# `few clouds` <int>, `heavy rain` <int>, not_rain <int>, rain <int>,
# geometry <POLYGON [m]>
dest_wt_rook <- st_weights(dest_nb_rook, style = "W", allow_zero = TRUE)We will mutate the newly created neighbour list object dest_nb_rook and weight matrix dest_wt_rook into our existing district_dest. This results in a newly created object called dest_rook_wm.
dest_rook_wm <- district_dest %>%
mutate(nb = dest_nb_rook,
wt = dest_wt_rook,
.before = 1)
# Inspect
dest_rook_wmSimple feature collection with 44 features and 8 fields
Geometry type: GEOMETRY
Dimension: XY
Bounding box: xmin: 13472330 ymin: -3037137 xmax: 13602610 ymax: -2881872
Projected CRS: UCS-2000 / Ukraine TM zone 10
# A tibble: 44 × 9
nb wt district num_of_trips avg_duration_minutes avg_distance_km
* <nb> <list> <chr> <int> <dbl> <dbl>
1 <int [4]> <dbl> cakung 844 21.5 6.40
2 <int [5]> <dbl> cempaka p… 392 19.1 6.13
3 <int [5]> <dbl> cengkareng 1120 19.9 6.20
4 <int [5]> <dbl> cilandak 905 20.1 5.78
5 <int [3]> <dbl> cilincing 277 24.4 7.24
6 <int [2]> <dbl> cipayung 552 22.5 7.08
7 <int [3]> <dbl> ciracas 546 22.6 7.03
8 <int [3]> <dbl> danau sun… 17 17.3 5.21
9 <int [3]> <dbl> danau sun… 33 16.9 5.00
10 <int [3]> <dbl> duren saw… 877 21.1 6.32
# ℹ 34 more rows
# ℹ 3 more variables: num_of_pois <int>, population_count <dbl>,
# geometry <POLYGON [m]>
5.3 Computing Local Moran’s Ii
Local Moran’s Ii is an extension of Global Moran’s I, designed to identify local clusters and spatial outliers within a dataset. Local Moran’s Ii provides a measure of autocorrelation at individual locations, identifying where significant clustering or outliers exist.
Let’s utilise the local_moran() function of sfdep to handle the computations.
lisa_queen_origin <- origin_queen_wm %>%
mutate(local_moran = local_moran(num_of_trips, nb, wt,
zero.policy = TRUE, nsim = 99),
.before = 1) %>%
unnest(local_moran)
# Inspect
lisa_queen_originSimple feature collection with 44 features and 31 fields
Geometry type: GEOMETRY
Dimension: XY
Bounding box: xmin: 13472330 ymin: -3037137 xmax: 13602610 ymax: -2881872
Projected CRS: UCS-2000 / Ukraine TM zone 10
# A tibble: 44 × 32
ii eii var_ii z_ii p_ii p_ii_sim p_folded_sim skewness
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 0.0386 -0.00988 0.0365 0.254 0.800 0.88 0.44 -0.361
2 0.431 -0.0546 0.0772 1.75 0.0803 0.04 0.02 -0.736
3 0.235 -0.0142 0.0244 1.60 0.110 0.14 0.07 0.499
4 0.0846 0.00614 0.00223 1.66 0.0968 0.16 0.08 0.168
5 0.348 -0.100 0.552 0.603 0.546 0.6 0.3 -0.412
6 0.207 0.0404 0.183 0.390 0.697 0.7 0.35 -0.865
7 0.237 -0.0445 0.0656 1.10 0.272 0.28 0.14 -0.359
8 0.379 0.0341 1.15 0.321 0.748 0.82 0.41 -0.692
9 0.780 0.00807 0.986 0.778 0.437 0.42 0.21 -0.173
10 -0.0614 -0.00521 0.0257 -0.350 0.726 0.8 0.4 0.276
# ℹ 34 more rows
# ℹ 24 more variables: kurtosis <dbl>, mean <fct>, median <fct>, pysal <fct>,
# nb <nb>, wt <list>, district <chr>, num_of_trips <int>,
# avg_duration_minutes <dbl>, avg_distance_km <dbl>, num_of_pois <int>,
# population_count <dbl>, `broken clouds` <int>, fog <int>, haze <int>,
# `light rain` <int>, `moderate rain` <int>, `scattered clouds` <int>,
# `overcast clouds` <int>, `few clouds` <int>, `heavy rain` <int>, …
lisa_queen_dest <- dest_queen_wm %>%
mutate(local_moran = local_moran(num_of_trips, nb, wt,
zero.policy = TRUE, nsim = 99),
.before = 1) %>%
unnest(local_moran)
# Inspect
lisa_queen_destSimple feature collection with 44 features and 20 fields
Geometry type: GEOMETRY
Dimension: XY
Bounding box: xmin: 13472330 ymin: -3037137 xmax: 13602610 ymax: -2881872
Projected CRS: UCS-2000 / Ukraine TM zone 10
# A tibble: 44 × 21
ii eii var_ii z_ii p_ii p_ii_sim p_folded_sim skewness
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 -0.0392 -0.00703 0.00561 -0.429 0.668 0.8 0.4 -0.130
2 0.626 -0.0000627 0.167 1.53 0.126 0.14 0.07 -0.275
3 0.518 -0.0257 0.140 1.45 0.146 0.2 0.1 -0.111
4 0.206 -0.00545 0.0163 1.66 0.0974 0.08 0.04 -0.0136
5 0.485 0.0158 0.559 0.627 0.531 0.56 0.28 -0.158
6 0.197 0.0234 0.122 0.497 0.619 0.72 0.36 -0.501
7 0.335 -0.0104 0.138 0.929 0.353 0.4 0.2 -0.260
8 0.381 -0.0836 1.21 0.423 0.672 0.76 0.38 -0.143
9 1.16 -0.304 1.42 1.23 0.220 0.26 0.13 -0.363
10 0.0351 0.0141 0.0194 0.151 0.880 0.9 0.45 0.181
# ℹ 34 more rows
# ℹ 13 more variables: kurtosis <dbl>, mean <fct>, median <fct>, pysal <fct>,
# nb <nb>, wt <list>, district <chr>, num_of_trips <int>,
# avg_duration_minutes <dbl>, avg_distance_km <dbl>, num_of_pois <int>,
# population_count <dbl>, geometry <POLYGON [m]>
lisa_rook_origin <- origin_rook_wm %>%
mutate(local_moran = local_moran(num_of_trips, nb, wt,
zero.policy = TRUE, nsim = 99),
.before = 1) %>%
unnest(local_moran)
# Inspect
lisa_rook_originSimple feature collection with 44 features and 31 fields
Geometry type: GEOMETRY
Dimension: XY
Bounding box: xmin: 13472330 ymin: -3037137 xmax: 13602610 ymax: -2881872
Projected CRS: UCS-2000 / Ukraine TM zone 10
# A tibble: 44 × 32
ii eii var_ii z_ii p_ii p_ii_sim p_folded_sim skewness
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 0.0386 0.00538 0.0265 0.204 0.838 0.84 0.42 -0.424
2 0.431 0.00649 0.135 1.16 0.248 0.22 0.11 -0.188
3 0.235 -0.0123 0.0208 1.72 0.0856 0.12 0.06 -0.0714
4 0.0846 0.000320 0.00266 1.64 0.102 0.08 0.04 0.149
5 0.348 0.0545 0.465 0.431 0.667 0.64 0.33 0.0393
6 0.370 -0.0383 0.330 0.711 0.477 0.48 0.24 -0.579
7 0.237 0.0123 0.0470 1.04 0.300 0.26 0.13 -0.669
8 0.379 0.0691 1.06 0.301 0.764 0.8 0.4 -0.565
9 0.780 -0.0657 1.01 0.840 0.401 0.38 0.19 -0.268
10 -0.0614 -0.0212 0.0195 -0.288 0.773 0.82 0.41 0.233
# ℹ 34 more rows
# ℹ 24 more variables: kurtosis <dbl>, mean <fct>, median <fct>, pysal <fct>,
# nb <nb>, wt <list>, district <chr>, num_of_trips <int>,
# avg_duration_minutes <dbl>, avg_distance_km <dbl>, num_of_pois <int>,
# population_count <dbl>, `broken clouds` <int>, fog <int>, haze <int>,
# `light rain` <int>, `moderate rain` <int>, `scattered clouds` <int>,
# `overcast clouds` <int>, `few clouds` <int>, `heavy rain` <int>, …
lisa_rook_dest <- dest_rook_wm %>%
mutate(local_moran = local_moran(num_of_trips, nb, wt,
zero.policy = TRUE, nsim = 99),
.before = 1) %>%
unnest(local_moran)
# Inspect
lisa_rook_destSimple feature collection with 44 features and 20 fields
Geometry type: GEOMETRY
Dimension: XY
Bounding box: xmin: 13472330 ymin: -3037137 xmax: 13602610 ymax: -2881872
Projected CRS: UCS-2000 / Ukraine TM zone 10
# A tibble: 44 × 21
ii eii var_ii z_ii p_ii p_ii_sim p_folded_sim skewness
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 -0.0392 0.0134 0.00695 -0.630 0.529 0.6 0.3 0.241
2 0.626 0.0189 0.197 1.37 0.171 0.22 0.11 0.152
3 0.518 -0.0241 0.178 1.28 0.199 0.2 0.1 0.145
4 0.206 -0.00658 0.0126 1.90 0.0580 0.1 0.05 0.0504
5 0.485 -0.0561 0.448 0.808 0.419 0.46 0.23 -0.296
6 0.303 0.00124 0.181 0.709 0.478 0.52 0.26 -0.0281
7 0.335 0.00764 0.148 0.851 0.395 0.44 0.22 -0.251
8 0.381 -0.351 1.24 0.658 0.510 0.5 0.25 0.155
9 1.16 -0.177 1.29 1.18 0.240 0.28 0.14 -0.184
10 0.0351 -0.0213 0.0154 0.454 0.650 0.7 0.35 0.224
# ℹ 34 more rows
# ℹ 13 more variables: kurtosis <dbl>, mean <fct>, median <fct>, pysal <fct>,
# nb <nb>, wt <list>, district <chr>, num_of_trips <int>,
# avg_duration_minutes <dbl>, avg_distance_km <dbl>, num_of_pois <int>,
# population_count <dbl>, geometry <POLYGON [m]>
5.4 Visualising Local Moran’s Ii
To ease our analysis, an approach we can take is to plot the local Moran’s I values across to visualise the observed values across each district. We’ll use a choropleth map from the tmap package to analyse the spatial patterns.
tm_shape(lisa_queen_origin) +
tm_fill("ii",
palette = c("#B3EBF2","green1","orange","red"),
title = "Local Moran's I",
midpoint = NA,
legend.hist = TRUE,
legend.is.portrait = TRUE,
legend.hist.z = 0.2) +
tm_borders(col = "black", alpha = 0.5) +
tm_layout(main.title = "District-Level Spatial Autocorrelation of Grab Trip Destinations in Jakarta",
main.title.position = "center",
main.title.size = 1,
main.title.fontface = "bold",
legend.title.size = 0.8,
legend.text.size = 0.8,
legend.hist.size = 0.8,
legend.outside = TRUE,
legend.outside.position = "right",
asp = 1.2,
frame = TRUE) +
tm_compass(type = "8star", text.size = 0.7, size = 3, position = c("RIGHT", "TOP")) +
tm_scale_bar(position = c("LEFT", "BOTTOM"), text.size = 0.5) +
tm_grid(labels.size = 0.6, alpha = 0.1)
Code
tmap_mode("view")
lisa_queen_origin <- lisa_queen_origin %>%
mutate(label = paste("District:", district, "| Local Moran's I:", round(ii, 3)))
tm_shape(lisa_queen_origin) +
tm_fill("ii",
palette = c("#B3EBF2", "green1", "orange", "red"),
title = "Local Moran's I",
midpoint = NA,
id = "label"
) +
tm_borders(col = "black", alpha = 0.5) Code
tmap_mode("plot")tm_shape(lisa_queen_dest) +
tm_fill("ii",
palette = c("#B3EBF2","green1","orange","red"),
title = "Local Moran's I",
midpoint = NA,
legend.hist = TRUE,
legend.is.portrait = TRUE,
legend.hist.z = 0.2) +
tm_borders(col = "black", alpha = 0.5) +
tm_layout(main.title = "District-Level Spatial Autocorrelation of Grab Trip Destinations in Jakarta",
main.title.position = "center",
main.title.size = 1,
main.title.fontface = "bold",
legend.title.size = 0.8,
legend.text.size = 0.8,
legend.hist.size = 0.8,
legend.outside = TRUE,
legend.outside.position = "right",
asp = 1.2,
frame = TRUE) +
tm_compass(type = "8star", text.size = 0.7, size = 3, position = c("RIGHT", "TOP")) +
tm_scale_bar(position = c("LEFT", "BOTTOM"), text.size = 0.5) +
tm_grid(labels.size = 0.6, alpha = 0.1)
Code
lisa_queen_dest <- lisa_queen_dest %>%
mutate(label = paste("District:", district, "| Local Moran's I:", round(ii, 3)))
tmap_mode("view")
tm_shape(lisa_queen_dest) +
tm_fill("ii",
palette = c("#B3EBF2", "green1", "orange", "red"),
title = "Local Moran's I",
midpoint = NA,
id = "label"
) +
tm_borders(col = "black", alpha = 0.5) Code
tmap_mode("plot")tm_shape(lisa_rook_origin) +
tm_fill("ii",
palette = c("#B3EBF2","green1","orange","red"),
title = "Local Moran's I",
midpoint = NA,
legend.hist = TRUE,
legend.is.portrait = TRUE,
legend.hist.z = 0.2) +
tm_borders(col = "black", alpha = 0.5) +
tm_layout(main.title = "District-Level Spatial Autocorrelation of Grab Trip Destinations in Jakarta",
main.title.position = "center",
main.title.size = 1,
main.title.fontface = "bold",
legend.title.size = 0.6,
legend.text.size = 0.6,
legend.hist.size = 0.6,
legend.outside = TRUE,
legend.outside.position = "right",
asp = 1.2,
frame = TRUE) +
tm_compass(type = "8star", text.size = 0.7, size = 3, position = c("RIGHT", "TOP")) +
tm_scale_bar(position = c("LEFT", "BOTTOM"), text.size = 0.5) +
tm_grid(labels.size = 0.6, alpha = 0.1)
Code
lisa_rook_origin <- lisa_rook_origin %>%
mutate(label = paste("District:", district, "| Local Moran's I:", round(ii, 3)))
tmap_mode("view")
tm_shape(lisa_rook_origin) +
tm_fill("ii",
palette = c("#B3EBF2", "green1", "orange", "red"),
title = "Local Moran's I",
midpoint = NA,
id = "label"
) +
tm_borders(col = "black", alpha = 0.5) Code
tmap_mode("plot")tm_shape(lisa_rook_dest) +
tm_fill("ii",
palette = c("#B3EBF2","green1","orange","red"),
title = "Local Moran's I",
midpoint = NA,
legend.hist = TRUE,
legend.is.portrait = TRUE,
legend.hist.z = 0.2) +
tm_borders(col = "black", alpha = 0.5) +
tm_layout(main.title = "District-Level Spatial Autocorrelation of Grab Trip Destinations in Jakarta",
main.title.position = "center",
main.title.size = 1,
main.title.fontface = "bold",
legend.title.size = 0.8,
legend.text.size = 0.8,
legend.hist.size = 0.8,
legend.outside = TRUE,
legend.outside.position = "right",
asp = 1.2,
frame = TRUE) +
tm_compass(type = "8star", text.size = 0.7, size = 3, position = c("RIGHT", "TOP")) +
tm_scale_bar(position = c("LEFT", "BOTTOM"), text.size = 0.5) +
tm_grid(labels.size = 0.6, alpha = 0.1)
Code
lisa_rook_dest <- lisa_rook_dest %>%
mutate(label = paste("District:", district, "| Local Moran's I:", round(ii, 3)))
tmap_mode("view")
tm_shape(lisa_rook_dest) +
tm_fill("ii",
palette = c("#B3EBF2", "green1", "orange", "red"),
title = "Local Moran's I",
midpoint = NA,
id = "label"
) +
tm_borders(col = "black", alpha = 0.5) Code
tmap_mode("plot")5.5 Visualising Local Moran’s Ii P-value
As mentioned in the section above, we shall not hastily conclude the clustering results observed. Instead, let us also evaluate whether the observed clustering (high-high or low-low) is statistically significant or could have occurred by chance. Hence, we can derive the p-values from Local Moran’s I by using the p_ii_sim field to determine statistical signficance across districts.
tm_shape(lisa_queen_origin) +
tm_fill("p_ii_sim",
palette = c("green3","lightyellow","orange","red"),
title = "p-value",
midpoint = NA,
legend.hist = TRUE,
legend.is.portrait = TRUE,
legend.hist.z = 0.2) +
tm_borders(col = "black", alpha = 0.5) +
tm_layout(main.title = "Statistically Significant Spatial Autocorrelation of Grab Trip Destinations in Jakarta",
main.title.position = "center",
main.title.size = 1,
main.title.fontface = "bold",
legend.title.size = 0.8,
legend.text.size = 0.8,
legend.hist.size = 0.8,
legend.outside = TRUE,
legend.outside.position = "right",
asp = 1.2,
frame = TRUE) +
tm_compass(type = "8star", text.size = 0.7, size = 3, position = c("RIGHT", "TOP")) +
tm_scale_bar(position = c("LEFT", "BOTTOM"), text.size = 0.5) +
tm_grid(labels.size = 0.6, alpha = 0.1)
Code
lisa_queen_origin <- lisa_queen_origin %>%
mutate(label = paste("District:", district, "| P-value:", round(p_ii_sim, 3)))
tmap_mode("view")
tm_shape(lisa_queen_origin) +
tm_fill("p_ii_sim",
palette = c("green3","lightyellow","orange","red"),
title = "P-value",
midpoint = NA,
id = "label"
) +
tm_borders(col = "black", alpha = 0.5) Code
tmap_mode("plot")tm_shape(lisa_queen_dest) +
tm_fill("p_ii_sim",
palette = c("green3","lightyellow","orange","red"),
title = "p-value",
midpoint = NA,
legend.hist = TRUE,
legend.is.portrait = TRUE,
legend.hist.z = 0.2) +
tm_borders(col = "black", alpha = 0.5) +
tm_layout(main.title = "Statistically Significant Spatial Autocorrelation of Grab Trip Destinations in Jakarta",
main.title.position = "center",
main.title.size = 1,
main.title.fontface = "bold",
legend.title.size = 0.8,
legend.text.size = 0.8,
legend.hist.size = 0.8,
legend.outside = TRUE,
legend.outside.position = "right",
asp = 1.2,
frame = TRUE) +
tm_compass(type = "8star", text.size = 0.7, size = 3, position = c("RIGHT", "TOP")) +
tm_scale_bar(position = c("LEFT", "BOTTOM"), text.size = 0.5) +
tm_grid(labels.size = 0.6, alpha = 0.1)
Code
lisa_queen_dest <- lisa_queen_dest %>%
mutate(label = paste("District:", district, "| P-value:", round(p_ii_sim, 3)))
tmap_mode("view")
tm_shape(lisa_queen_dest) +
tm_fill("p_ii_sim",
palette = c("green3","lightyellow","orange","red"),
title = "P-value",
midpoint = NA,
id = "label"
) +
tm_borders(col = "black", alpha = 0.5) Code
tmap_mode("plot")tm_shape(lisa_rook_origin) +
tm_fill("p_ii_sim",
palette = c("green3","lightyellow","orange","red"),
title = "p-value",
midpoint = NA,
legend.hist = TRUE,
legend.is.portrait = TRUE,
legend.hist.z = 0.2) +
tm_borders(col = "black", alpha = 0.5) +
tm_layout(main.title = "Statistically Significant Spatial Autocorrelation of Grab Trip Destinations in Jakarta",
main.title.position = "center",
main.title.size = 1,
main.title.fontface = "bold",
legend.title.size = 0.8,
legend.text.size = 0.8,
legend.hist.size = 0.8,
legend.outside = TRUE,
legend.outside.position = "right",
asp = 1.2,
frame = TRUE) +
tm_compass(type = "8star", text.size = 0.7, size = 3, position = c("RIGHT", "TOP")) +
tm_scale_bar(position = c("LEFT", "BOTTOM"), text.size = 0.5) +
tm_grid(labels.size = 0.6, alpha = 0.1)
Code
lisa_rook_origin <- lisa_rook_origin %>%
mutate(label = paste("District:", district, "| P-value:", round(p_ii_sim, 3)))
tmap_mode("view")
tm_shape(lisa_rook_origin) +
tm_fill("p_ii_sim",
palette = c("green3","lightyellow","orange","red"),
title = "P-value",
midpoint = NA,
id = "label"
) +
tm_borders(col = "black", alpha = 0.5) Code
tmap_mode("plot")tm_shape(lisa_rook_dest) +
tm_fill("p_ii_sim",
palette = c("green3","lightyellow","orange","red"),
title = "p-value",
midpoint = NA,
legend.hist = TRUE,
legend.is.portrait = TRUE,
legend.hist.z = 0.2) +
tm_borders(col = "black", alpha = 0.5) +
tm_layout(main.title = "Statistically Significant Spatial Autocorrelation of Grab Trip Destinations in Jakarta",
main.title.position = "center",
main.title.size = 1,
main.title.fontface = "bold",
legend.title.size = 0.8,
legend.text.size = 0.8,
legend.hist.size = 0.8,
legend.outside = TRUE,
legend.outside.position = "right",
asp = 1.2,
frame = TRUE) +
tm_compass(type = "8star", text.size = 0.7, size = 3, position = c("RIGHT", "TOP")) +
tm_scale_bar(position = c("LEFT", "BOTTOM"), text.size = 0.5) +
tm_grid(labels.size = 0.6, alpha = 0.1)
Code
lisa_rook_dest <- lisa_rook_dest %>%
mutate(label = paste("District:", district, "| P-value:", round(p_ii_sim, 3)))
tmap_mode("view")
tm_shape(lisa_rook_dest) +
tm_fill("p_ii_sim",
palette = c("green3","lightyellow","orange","red"),
title = "P-value",
midpoint = NA,
id = "label"
) +
tm_borders(col = "black", alpha = 0.5) Code
tmap_mode("plot")5.6 Visualising Statistically Significant Local Moran’s Ii
With that said, I would like to switch our focus to districts that display statistically significant local Moran’s I values. To execute this, I will attempt to remove all local Moran’s I values with p-values greater than 0.05. Subsequently, I will use the tmap function to plot the choropleth of statistically significant local spatial autocorrelation on the map of Jakarta.
lisa_queen_origin_sig <- lisa_queen_origin %>%
filter(p_ii_sim < 0.05)
tm_shape(lisa_queen_origin)+
tm_polygons() +
tm_borders(col = "black", alpha = 0.6)+
tm_shape(lisa_queen_origin_sig) +
tm_fill("ii",
palette = c("#B3EBF2",'green3',"green3","lightyellow","orange","orange4","red"),
title = "Local Moran's I (p-value < 0.05)",
midpoint = NA,
legend.hist = TRUE,
legend.is.portrait = TRUE,
legend.hist.z = 0.2) +
tm_borders(col = "black", alpha = 0.5) +
tm_layout(main.title = "Statistically Significant Spatial Autocorrelation of Grab Trip Destinations in Jakarta",
main.title.position = "center",
main.title.size = 1,
main.title.fontface = "bold",
legend.title.size = 0.8,
legend.text.size = 0.8,
legend.hist.size = 0.8,
legend.outside = TRUE,
legend.outside.position = "right",
asp = 1.2,
frame = TRUE) +
tm_compass(type = "8star", text.size = 0.7, size = 3, position = c("RIGHT", "TOP")) +
tm_scale_bar(position = c("LEFT", "BOTTOM"), text.size = 0.5) +
tm_grid(labels.size = 0.6, alpha = 0.1)
Code
lisa_queen_origin_sig <- lisa_queen_origin_sig %>%
mutate(label = paste("District:", district, "| Local Moran's I:", round(ii, 3)))
tmap_mode("view")
tm_shape(lisa_queen_origin) +
tm_polygons(id = "") +
tm_borders(col = "black", alpha = 0.6)+
tm_shape(lisa_queen_origin_sig) +
tm_fill("ii",
palette = c("#B3EBF2",'green3',"green3","lightyellow","orange","orange4","red"),
title = "Local Moran's I (p-value < 0.05)",
midpoint = NA,
id = "label"
) +
tm_borders(col = "black", alpha = 0.5) Code
tmap_mode("plot")lisa_queen_dest_sig <- lisa_queen_dest %>%
filter(p_ii_sim < 0.05)
tm_shape(lisa_queen_dest)+
tm_polygons() +
tm_borders(col = "black", alpha = 0.6)+
tm_shape(lisa_queen_dest_sig) +
tm_fill("ii",
palette = c("#B3EBF2",'green3',"green3","lightyellow","orange","orange4","red"),
title = "Local Moran's I (p-value < 0.05)",
midpoint = NA,
legend.hist = TRUE,
legend.is.portrait = TRUE,
legend.hist.z = 0.2) +
tm_borders(col = "black", alpha = 0.5) +
tm_layout(main.title = "Statistically Significant Spatial Autocorrelation of Grab Trip Destinations in Jakarta",
main.title.position = "center",
main.title.size = 1,
main.title.fontface = "bold",
legend.title.size = 0.8,
legend.text.size = 0.8,
legend.hist.size = 0.8,
legend.outside = TRUE,
legend.outside.position = "right",
asp = 1.2,
frame = TRUE) +
tm_compass(type = "8star", text.size = 0.7, size = 3, position = c("RIGHT", "TOP")) +
tm_scale_bar(position = c("LEFT", "BOTTOM"), text.size = 0.5) +
tm_grid(labels.size = 0.6, alpha = 0.1)
Code
lisa_queen_dest_sig <- lisa_queen_dest_sig %>%
mutate(label = paste("District:", district, "| Local Moran's I:", round(ii, 3)))
tmap_mode("view")
tm_shape(lisa_queen_dest) +
tm_polygons(id = "") +
tm_borders(col = "black", alpha = 0.6)+
tm_shape(lisa_queen_dest_sig) +
tm_fill("ii",
palette = c("#B3EBF2",'green3',"green3","lightyellow","orange","orange4","red"),
title = "Local Moran's I (p-value < 0.05)",
midpoint = NA,
id = "label"
) +
tm_borders(col = "black", alpha = 0.5) Code
tmap_mode("plot")lisa_queen_origin_sig <- lisa_queen_origin %>%
filter(p_ii_sim < 0.05)
tm_shape(lisa_queen_origin)+
tm_polygons() +
tm_borders(col = "black", alpha = 0.6)+
tm_shape(lisa_queen_origin_sig) +
tm_fill("ii",
palette = c("#B3EBF2",'green3',"green3","lightyellow","orange","orange4","red"),
title = "Local Moran's I (p-value < 0.05)",
midpoint = NA,
legend.hist = TRUE,
legend.is.portrait = TRUE,
legend.hist.z = 0.2) +
tm_borders(col = "black", alpha = 0.5) +
tm_layout(main.title = "Statistically Significant Spatial Autocorrelation of Grab Trip Destinations in Jakarta",
main.title.position = "center",
main.title.size = 1,
main.title.fontface = "bold",
legend.title.size = 0.8,
legend.text.size = 0.8,
legend.hist.size = 0.8,
legend.outside = TRUE,
legend.outside.position = "right",
asp = 1.2,
frame = TRUE) +
tm_compass(type = "8star", text.size = 0.7, size = 3, position = c("RIGHT", "TOP")) +
tm_scale_bar(position = c("LEFT", "BOTTOM"), text.size = 0.5) +
tm_grid(labels.size = 0.6, alpha = 0.1)
Code
lisa_queen_origin_sig <- lisa_queen_origin_sig %>%
mutate(label = paste("District:", district, "| Local Moran's I:", round(ii, 3)))
tmap_mode("view")
tm_shape(lisa_queen_origin) +
tm_polygons(id = "") +
tm_borders(col = "black", alpha = 0.6)+
tm_shape(lisa_queen_origin_sig) +
tm_fill("ii",
palette = c("#B3EBF2",'green3',"green3","lightyellow","orange","orange4","red"),
title = "Local Moran's I (p-value < 0.05)",
midpoint = NA,
id = "label"
) +
tm_borders(col = "black", alpha = 0.5) Code
tmap_mode("plot")lisa_rook_dest_sig <- lisa_rook_dest %>%
filter(p_ii_sim < 0.05)
tm_shape(lisa_rook_dest)+
tm_polygons() +
tm_borders(col = "black", alpha = 0.6)+
tm_shape(lisa_rook_dest_sig) +
tm_fill("ii",
palette = c("#B3EBF2",'green3',"green3","lightyellow","orange","orange4","red"),
title = "Local Moran's I (p-value < 0.05)",
midpoint = NA,
legend.hist = TRUE,
legend.is.portrait = TRUE,
legend.hist.z = 0.2) +
tm_borders(col = "black", alpha = 0.5) +
tm_layout(main.title = "Statistically Significant Spatial Autocorrelation of Grab Trip Destinations in Jakarta",
main.title.position = "center",
main.title.size = 1,
main.title.fontface = "bold",
legend.title.size = 0.8,
legend.text.size = 0.8,
legend.hist.size = 0.8,
legend.outside = TRUE,
legend.outside.position = "right",
asp = 1.2,
frame = TRUE) +
tm_compass(type = "8star", text.size = 0.7, size = 3, position = c("RIGHT", "TOP")) +
tm_scale_bar(position = c("LEFT", "BOTTOM"), text.size = 0.5) +
tm_grid(labels.size = 0.6, alpha = 0.1)
Code
lisa_rook_dest_sig <- lisa_rook_dest_sig %>%
mutate(label = paste("District:", district, "| Local Moran's I:", round(ii, 3)))
tmap_mode("view")
tm_shape(lisa_rook_dest) +
tm_polygons(id = "") +
tm_borders(col = "black", alpha = 0.6)+
tm_shape(lisa_rook_dest_sig) +
tm_fill("ii",
palette = c("#B3EBF2",'green3',"green3","lightyellow","orange","orange4","red"),
title = "Local Moran's I (p-value < 0.05)",
midpoint = NA,
id = "label"
) +
tm_borders(col = "black", alpha = 0.5) Code
tmap_mode("plot")5.7 LISA Classification Categories
A LISA (Local Indicators of Spatial Association) map is a visualisation tool suitable for our spatial analysis in illustrating the results of Local Moran’s I. By doing so, we can identify spatial patterns, clusters, and outliers in the demand for Grab services by mapping the local statistics for each district in Jakarta.
These are the key components of the LISA maps we will plot:
High-High (HH): Areas with high no. of Grab trips surrounded by other high values (hotspots).
Low-Low (LL): Areas with low no. of Grab trips surrounded by other low values (cold spots).
High-Low (HL): Areas with high no. of Grab trips surrounded by low values (outliers).
Low-High (LH): Areas with low no. of Grab trips surrounded by high values (outliers).
5.7.1 Visualising Overall LISA of Number of Trips in Jakarta Study Area
# Let's inspect the 'mean' column
summary(lisa_queen_origin$mean) Low-Low High-Low Low-High High-High
18 8 7 11
# Let's inspect the 'mean' column
summary(lisa_queen_dest$mean) Low-Low High-Low Low-High High-High
17 7 6 14
# Let's inspect the 'mean' column
summary(lisa_rook_origin$mean) Low-Low High-Low Low-High High-High
18 8 7 11
# Let's inspect the 'mean' column
summary(lisa_rook_dest$mean) Low-Low High-Low Low-High High-High
17 7 6 14
To plot our LISA classifications across districts in Jakarta, we’ll leverage the mean category of the output of our lisa calculations, used as a reference point to determine if individual districts have higher or lower values compared to this average. This provides a baseline for detecting these spatial clusters and outliers in the dataset based on the categories mentioned above. We will produce both overall and statistically significant LISA maps below. There is almost no difference in the spread of LISA categories when queen and rook contiguity methods are used.
Code
lisa_cat_origin <- tm_shape(lisa_queen_origin) +
tm_polygons("mean",
# blue, orange, green, red
palette = c("lightblue1", "#ec9a64","green3", "#d21b1c"),
title = "LISA Classification",
midpoint = NA,
legend.hist = TRUE,
legend.is.portrait = TRUE,
legend.hist.z = 0.1) +
tm_layout(main.title = "Overall LISA Spatial Autocorrelation of Grab Trip Origins in Jakarta - Queen",
main.title.position = "center",
main.title.size = 1,
main.title.fontface = "bold",
legend.title.size = 0.8,
legend.text.size = 0.8,
legend.hist.size = 0.8,
legend.outside = TRUE,
legend.outside.position = "right",
frame = TRUE) +
tm_borders(alpha = 0.5) +
tm_compass(type = "8star", text.size = 1, size = 2, position = c("RIGHT", "TOP")) +
tm_scale_bar(position = c("LEFT", "BOTTOM"), text.size = 0.5) +
tm_grid(alpha = 0.1)
lisa_cat_dest <- tm_shape(lisa_queen_dest) +
tm_polygons("mean",
# blue, orange, green, red
palette = c("lightblue1", "#ec9a64","green3", "#d21b1c"),
title = "LISA Classification",
midpoint = NA,
legend.hist = TRUE,
legend.is.portrait = TRUE,
legend.hist.z = 0.1) +
tm_layout(main.title = "Overall LISA Spatial Autocorrelation of Grab Trip Destinations in Jakarta - Queen",
main.title.position = "center",
main.title.size = 1,
main.title.fontface = "bold",
legend.title.size = 0.8,
legend.text.size = 0.8,
legend.hist.size = 0.8,
legend.outside = TRUE,
legend.outside.position = "right",
frame = TRUE) +
tm_borders(alpha = 0.5) +
tm_compass(type = "8star", text.size = 1, size = 2, position = c("RIGHT", "TOP")) +
tm_scale_bar(position = c("LEFT", "BOTTOM"), text.size = 0.5) +
tm_grid(alpha = 0.1)
tmap_arrange(lisa_cat_origin, lisa_cat_dest, asp=1, nrow=2)
Code
lisa_cat_origin <- tm_shape(lisa_rook_origin) +
tm_polygons("mean",
palette = c("lightblue1", "#ec9a64","green3", "#d21b1c"),
title = "LISA Classification",
midpoint = NA,
legend.hist = TRUE,
legend.is.portrait = TRUE,
legend.hist.z = 0.1) +
tm_layout(main.title = "Overall LISA Spatial Autocorrelation of Grab Trip Origins in Jakarta - Rook",
main.title.position = "center",
main.title.size = 1,
main.title.fontface = "bold",
legend.title.size = 0.8,
legend.text.size = 0.8,
legend.hist.size = 0.8,
legend.outside = TRUE,
legend.outside.position = "right",
frame = TRUE) +
tm_borders(alpha = 0.5) +
tm_compass(type = "8star", text.size = 1, size = 2, position = c("RIGHT", "TOP")) +
tm_scale_bar(position = c("LEFT", "BOTTOM"), text.size = 0.5) +
tm_grid(alpha = 0.1)
lisa_cat_dest <- tm_shape(lisa_rook_dest) +
tm_polygons("mean",
palette = c("lightblue1", "#ec9a64","green3", "#d21b1c"),
title = "LISA Classification",
midpoint = NA,
legend.hist = TRUE,
legend.is.portrait = TRUE,
legend.hist.z = 0.1) +
tm_layout(main.title = "Overall LISA Spatial Autocorrelation of Grab Trip Destinations in Jakarta - Rook",
main.title.position = "center",
main.title.size = 1,
main.title.fontface = "bold",
legend.title.size = 0.8,
legend.text.size = 0.8,
legend.hist.size = 0.8,
legend.outside = TRUE,
legend.outside.position = "right",
frame = TRUE) +
tm_borders(alpha = 0.5) +
tm_compass(type = "8star", text.size = 1, size = 2, position = c("RIGHT", "TOP")) +
tm_scale_bar(position = c("LEFT", "BOTTOM"), text.size = 0.5) +
tm_grid(alpha = 0.1)
tmap_arrange(lisa_cat_origin, lisa_cat_dest, asp=1, nrow=2)
We can also visualise all statistically significant LISA categories where p-value < 0.05.
We can see that the queen method leads to fewer high-high categories and more low-high categories than when the rook method was used. It might suggest that the broader neighborhood definitions are causing the high-value districts to interact more with low-value ones, thus affecting the overall clustering results.
Code
lisa_queen_origin_sig <- lisa_queen_origin %>%
filter(p_ii_sim < 0.05)
lisa_cat_origin_sig <- tm_shape(lisa_queen_origin)+
tm_polygons() +
tm_borders(col = "black", alpha = 0.6)+
tm_shape(lisa_queen_origin_sig) +
tm_polygons("mean",
palette = c("lightblue1", "#ec9a64","green3", "#d21b1c"),
title = "LISA Classification",
midpoint = NA,
legend.hist = TRUE,
legend.is.portrait = TRUE,
legend.hist.z = 0.1) +
tm_layout(main.title = "Statistically Significant LISA Map of Grab Trip Origins in Jakarta - Queen",
main.title.position = "center",
main.title.size = 1,
main.title.fontface = "bold",
legend.title.size = 0.8,
legend.text.size = 0.8,
legend.hist.size = 0.8,
legend.outside = TRUE,
legend.outside.position = "right",
frame = TRUE) +
tm_borders(alpha = 0.5) +
tm_compass(type = "8star", text.size = 1, size = 2, position = c("RIGHT", "TOP")) +
tm_scale_bar(position = c("LEFT", "BOTTOM"), text.size = 0.5) +
tm_grid(alpha = 0.1)
lisa_queen_dest_sig <- lisa_queen_dest %>%
filter(p_ii_sim < 0.05)
lisa_cat_dest_sig <- tm_shape(lisa_queen_dest)+
tm_polygons() +
tm_borders(col = "black", alpha = 0.6)+
tm_shape(lisa_queen_dest_sig) +
tm_polygons("mean",
palette = c("lightblue1", "#ec9a64","green3", "#d21b1c"),
title = "LISA Classification",
midpoint = NA,
legend.hist = TRUE,
legend.is.portrait = TRUE,
legend.hist.z = 0.1) +
tm_layout(main.title = "Statistically Significant LISA Map of Grab Trip Destinations in Jakarta - Queen",
main.title.position = "center",
main.title.size = 1,
main.title.fontface = "bold",
legend.title.size = 0.8,
legend.text.size = 0.8,
legend.hist.size = 0.8,
legend.outside = TRUE,
legend.outside.position = "right",
frame = TRUE) +
tm_borders(alpha = 0.5) +
tm_compass(type = "8star", text.size = 1, size = 2, position = c("RIGHT", "TOP")) +
tm_scale_bar(position = c("LEFT", "BOTTOM"), text.size = 0.5) +
tm_grid(alpha = 0.1)
tmap_arrange(lisa_cat_origin_sig, lisa_cat_dest_sig, asp=1, nrow=2)
Code
lisa_rook_origin_sig <- lisa_rook_origin %>%
filter(p_ii_sim < 0.05)
lisa_cat_origin_sig <- tm_shape(lisa_rook_origin) +
tm_polygons() +
tm_borders(col = "black", alpha = 0.6) +
tm_shape(lisa_rook_origin_sig) +
tm_polygons("mean",
palette = c("lightblue1", "#ec9a64","green3", "#d21b1c"),
title = "LISA Classification",
midpoint = NA,
legend.hist = TRUE,
legend.is.portrait = TRUE,
legend.hist.z = 0.1) +
tm_layout(main.title = "Statistically Significant LISA Map of Grab Trip Origins in Jakarta - Rook",
main.title.position = "center",
main.title.size = 1,
main.title.fontface = "bold",
legend.title.size = 0.8,
legend.text.size = 0.8,
legend.hist.size = 0.8,
legend.outside = TRUE,
legend.outside.position = "right",
frame = TRUE) +
tm_borders(alpha = 0.5) +
tm_compass(type = "8star", text.size = 1, size = 2, position = c("RIGHT", "TOP")) +
tm_scale_bar(position = c("LEFT", "BOTTOM"), text.size = 0.5) +
tm_grid(alpha = 0.1)
lisa_rook_dest_sig <- lisa_rook_dest %>%
filter(p_ii_sim < 0.05)
lisa_cat_dest_sig <- tm_shape(lisa_rook_dest) +
tm_polygons() +
tm_borders(col = "black", alpha = 0.6) +
tm_shape(lisa_rook_dest_sig) +
tm_polygons("mean",
palette = c("lightblue1", "#ec9a64","green3", "#d21b1c"),
title = "LISA Classification",
midpoint = NA,
legend.hist = TRUE,
legend.is.portrait = TRUE,
legend.hist.z = 0.1) +
tm_layout(main.title = "Statistically Significant LISA Map of Grab Trip Destinations in Jakarta - Rook",
main.title.position = "center",
main.title.size = 1,
main.title.fontface = "bold",
legend.title.size = 0.8,
legend.text.size = 0.8,
legend.hist.size = 0.8,
legend.outside = TRUE,
legend.outside.position = "right",
frame = TRUE) +
tm_borders(alpha = 0.5) +
tm_compass(type = "8star", text.size = 1, size = 2, position = c("RIGHT", "TOP")) +
tm_scale_bar(position = c("LEFT", "BOTTOM"), text.size = 0.5) +
tm_grid(alpha = 0.1)
tmap_arrange(lisa_cat_origin_sig, lisa_cat_dest_sig, asp=1, nrow=2)
Here’s the interactive version of the static plots above for more readability using tmap_mode('view'). Each district, when hovered above, shows the district name and LISA category, unless it is not statistically significant then it will just show the district name.
tmap_mode("view")tmap mode set to interactive viewing
Code
# Filter significant districts
lisa_queen_origin_sig <- lisa_queen_origin %>%
filter(p_ii_sim < 0.05) %>%
mutate(label = paste("District:", district, "| ", mean))
# First map: Overall LISA Spatial Autocorrelation
lisa_cat <- tm_shape(lisa_queen_origin) +
tm_polygons("mean",
palette = c("lightblue1", "#ec9a64", "green3", "#d21b1c"),
title = "Overall LISA Classification",
id = "label")
# Second map: Statistical Significant LISA Map
lisa_queen_origin_sig <- lisa_queen_origin_sig %>%
mutate(label = paste("District:", district, "| ", mean))
lisa_cat_sig <- tm_shape(lisa_queen_origin) +
tm_polygons(id = "") +
tm_borders(col = "black", alpha = 0.6) +
tm_shape(lisa_queen_origin_sig) +
tm_polygons("mean",
palette = c("lightblue1", "#ec9a64", "green3", "#d21b1c"),
title = "Significant LISA Classification",
id = "label")
tmap_arrange(lisa_cat, lisa_cat_sig, asp = 1, ncol = 2)Code
tmap_mode("plot")tmap_mode("view")tmap mode set to interactive viewing
Code
# Filter significant districts
lisa_queen_dest_sig <- lisa_queen_dest %>%
filter(p_ii_sim < 0.05) %>%
mutate(label = paste("District:", district, "| ", mean))
# First map: Overall LISA Spatial Autocorrelation
lisa_cat <- tm_shape(lisa_queen_dest) +
tm_polygons("mean",
palette = c("lightblue1", "#ec9a64", "green3", "#d21b1c"),
title = "Overall LISA Classification",
id = "label")
# Second map: Statistical Significant LISA Map
lisa_queen_dest_sig <- lisa_queen_dest_sig %>%
mutate(label = paste("District:", district, "| ", mean))
lisa_cat_sig <- tm_shape(lisa_queen_dest) +
tm_polygons(id = "") +
tm_borders(col = "black", alpha = 0.6) +
tm_shape(lisa_queen_dest_sig) +
tm_polygons("mean",
palette = c("lightblue1", "#ec9a64", "green3", "#d21b1c"),
title = "Significant LISA Classification",
id = "label")
tmap_arrange(lisa_cat, lisa_cat_sig, asp = 1, ncol = 2)Code
tmap_mode("plot")tmap_mode("view")tmap mode set to interactive viewing
Code
# Filter significant districts
lisa_rook_origin_sig <- lisa_rook_origin %>%
filter(p_ii_sim < 0.05) %>%
mutate(label = paste("District:", district, "| ", mean))
# First map: Overall LISA Spatial Autocorrelation
lisa_cat <- tm_shape(lisa_rook_origin) +
tm_polygons("mean",
palette = c("lightblue1", "#ec9a64", "green3", "#d21b1c"),
title = "Overall LISA Classification",
id = "label")
# Second map: Statistical Significant LISA Map
lisa_rook_origin_sig <- lisa_rook_origin_sig %>%
mutate(label = paste("District:", district, "| ", mean))
lisa_cat_sig <- tm_shape(lisa_rook_origin) +
tm_polygons(id = "") +
tm_borders(col = "black", alpha = 0.6) +
tm_shape(lisa_rook_origin_sig) +
tm_polygons("mean",
palette = c("lightblue1", "#ec9a64", "green3", "#d21b1c"),
title = "Significant LISA Classification",
id = "label")
tmap_arrange(lisa_cat, lisa_cat_sig, asp = 1, ncol = 2)Code
tmap_mode("plot")tmap_mode("view")tmap mode set to interactive viewing
Code
# Filter significant districts
lisa_rook_dest_sig <- lisa_rook_dest %>%
filter(p_ii_sim < 0.05) %>%
mutate(label = paste("District:", district, "| ", mean))
# First map: Overall LISA Spatial Autocorrelation
lisa_cat <- tm_shape(lisa_rook_dest) +
tm_polygons("mean",
palette = c("lightblue1", "#ec9a64", "green3", "#d21b1c"),
title = "Overall LISA Classification",
id = "label")
# Second map: Statistical Significant LISA Map
lisa_rook_dest_sig <- lisa_rook_dest_sig %>%
mutate(label = paste("District:", district, "| ", mean))
lisa_cat_sig <- tm_shape(lisa_rook_dest) +
tm_polygons(id = "") +
tm_borders(col = "black", alpha = 0.6) +
tm_shape(lisa_rook_dest_sig) +
tm_polygons("mean",
palette = c("lightblue1", "#ec9a64", "green3", "#d21b1c"),
title = "Significant LISA Classification",
id = "label")
tmap_arrange(lisa_cat, lisa_cat_sig, asp = 1, ncol = 2)Code
tmap_mode("plot")5.7.2 Visualising LISA of Jakarta Study Area by Driving Mode
Conducting LISA classification of the number of Grab trips taken by driving mode can provide valuable insights into spatial disparities in travel demand across districts, by car and motorcycle. We will prepare the car_origin, car_dest, motorcycle_origin and motorcycle_dest dataframes to store the total number of trips for each origin and destination district of Grab trips. Note that I will remove districts ‘outside of Jakarta’ since we do not have geometry values for those districts.
- Car-focused clusters might highlight districts where road infrastructure, socioeconomic factors, and parking policies influence car travel demand.
- Motorcycle-focused clusters can point to areas where motorcycles are essential for accessibility, affordability, and maneuverability, especially in highly congested or densely populated districts.
- Car-focused outliers: Districts with high car trips surrounded by low-use areas may highlight unique features attracting car users, requiring tailored infrastructure solutions.
- Motorcycle-focused outliers: Areas with high motorcycle use amidst low surrounding usage may indicate critical transit hubs needing efficient drop-off zones and integrated transport solutions.
origin_driving_mode <- trips %>%
filter(origin_district != 'outside of jakarta') %>%
group_by(origin_district, driving_mode) %>%
summarise(num_of_trips = n(), .groups = "drop") %>%
pivot_wider(names_from = driving_mode, values_from = num_of_trips) %>%
rename(district = origin_district)
car_origin <- origin_driving_mode %>%
select(district, car) %>%
rename(num_of_trips = car) %>%
left_join(district_geom, by = 'district') %>%
st_as_sf()
motorcycle_origin <- origin_driving_mode %>%
select(district, motorcycle) %>%
rename(num_of_trips = motorcycle) %>%
left_join(district_geom, by = 'district') %>%
st_as_sf()
# Inspect
car_originSimple feature collection with 44 features and 2 fields
Geometry type: GEOMETRY
Dimension: XY
Bounding box: xmin: 13472330 ymin: -3037137 xmax: 13602610 ymax: -2881872
Projected CRS: UCS-2000 / Ukraine TM zone 10
# A tibble: 44 × 3
district num_of_trips geometry
<chr> <int> <POLYGON [m]>
1 cakung 271 ((13577740 -2957434, 13577686 -2957585, 135780…
2 cempaka putih 225 ((13546309 -2954155, 13546032 -2954655, 135451…
3 cengkareng 438 ((13501268 -2903641, 13501646 -2904344, 135020…
4 cilandak 396 ((13493440 -2969564, 13493591 -2969657, 134936…
5 cilincing 173 ((13600325 -2936635, 13600614 -2937446, 136008…
6 cipayung 214 ((13537176 -3005377, 13537282 -3005431, 135372…
7 ciracas 263 ((13521824 -2996302, 13521925 -2996378, 135223…
8 danau sunter 3 ((13546437 -2939299, 13546343 -2939273, 135461…
9 danau sunter dll 27 ((13541670 -2937824, 13540217 -2937341, 135396…
10 duren sawit 362 ((13548180 -2972327, 13548729 -2972607, 135497…
# ℹ 34 more rows
dest_driving_mode <- trips %>%
filter(destination_district != 'outside of jakarta') %>%
group_by(destination_district, driving_mode) %>%
summarise(num_of_trips = n(), .groups = "drop") %>%
pivot_wider(names_from = driving_mode, values_from = num_of_trips) %>%
rename(district = destination_district)
car_dest <- dest_driving_mode %>%
select(district, car) %>%
rename(num_of_trips = car) %>%
left_join(district_geom, by = 'district') %>%
st_as_sf()
motorcycle_dest <- dest_driving_mode %>%
select(district, motorcycle) %>%
rename(num_of_trips = motorcycle) %>%
left_join(district_geom, by = 'district') %>%
st_as_sf()
# Inspect
car_destSimple feature collection with 44 features and 2 fields
Geometry type: GEOMETRY
Dimension: XY
Bounding box: xmin: 13472330 ymin: -3037137 xmax: 13602610 ymax: -2881872
Projected CRS: UCS-2000 / Ukraine TM zone 10
# A tibble: 44 × 3
district num_of_trips geometry
<chr> <int> <POLYGON [m]>
1 cakung 336 ((13577740 -2957434, 13577686 -2957585, 135780…
2 cempaka putih 209 ((13546309 -2954155, 13546032 -2954655, 135451…
3 cengkareng 540 ((13501268 -2903641, 13501646 -2904344, 135020…
4 cilandak 413 ((13493440 -2969564, 13493591 -2969657, 134936…
5 cilincing 146 ((13600325 -2936635, 13600614 -2937446, 136008…
6 cipayung 284 ((13537176 -3005377, 13537282 -3005431, 135372…
7 ciracas 252 ((13521824 -2996302, 13521925 -2996378, 135223…
8 danau sunter 12 ((13546437 -2939299, 13546343 -2939273, 135461…
9 danau sunter dll 21 ((13541670 -2937824, 13540217 -2937341, 135396…
10 duren sawit 364 ((13548180 -2972327, 13548729 -2972607, 135497…
# ℹ 34 more rows
Next, I will compute Local Moran’s I down to the granularity of origin/destination, queen/rook and car/motorcycle. For instance, queen - origin - car will only focus on cars moving from the start of trip, using the queen contiguity method to locate its neighbours.
car_origin_nb_queen <- st_contiguity(car_origin$geometry, queen=TRUE)
car_origin_wt_queen <- st_weights(car_origin_nb_queen, style = "W", allow_zero = TRUE)
car_origin_wm_queen <- car_origin %>%
mutate(nb = car_origin_nb_queen,
wt = car_origin_wt_queen,
.before = 1)
lisa_queen_origin_car <- car_origin_wm_queen %>%
mutate(local_moran = local_moran(num_of_trips, nb, wt,
zero.policy = TRUE, nsim = 99),
.before = 1) %>%
unnest(local_moran)car_dest_nb_queen <- st_contiguity(car_dest$geometry, queen = TRUE)
car_dest_wt_queen <- st_weights(car_dest_nb_queen, style = "W", allow_zero = TRUE)
car_dest_wm_queen <- car_dest %>%
mutate(nb = car_dest_nb_queen,
wt = car_dest_wt_queen,
.before = 1)
lisa_queen_dest_car <- car_dest_wm_queen %>%
mutate(local_moran = local_moran(num_of_trips, nb, wt,
zero.policy = TRUE, nsim = 99),
.before = 1) %>%
unnest(local_moran)car_origin_nb_rook <- st_contiguity(car_origin$geometry, queen = FALSE)
car_origin_wt_rook <- st_weights(car_origin_nb_rook, style = "W", allow_zero = TRUE)
car_origin_wm_rook <- car_origin %>%
mutate(nb = car_origin_nb_rook,
wt = car_origin_wt_rook,
.before = 1)
lisa_rook_origin_car <- car_origin_wm_rook %>%
mutate(local_moran = local_moran(num_of_trips, nb, wt,
zero.policy = TRUE, nsim = 99),
.before = 1) %>%
unnest(local_moran)car_dest_nb_rook <- st_contiguity(car_dest$geometry, queen = FALSE)
car_dest_wt_rook <- st_weights(car_dest_nb_rook, style = "W", allow_zero = TRUE)
car_dest_wm_rook <- car_dest %>%
mutate(nb = car_dest_nb_rook,
wt = car_dest_wt_rook,
.before = 1)
lisa_rook_dest_car <- car_dest_wm_rook %>%
mutate(local_moran = local_moran(num_of_trips, nb, wt,
zero.policy = TRUE, nsim = 99),
.before = 1) %>%
unnest(local_moran)motorcycle_origin_nb_queen <- st_contiguity(motorcycle_origin$geometry, queen=TRUE)
motorcycle_origin_wt_queen <- st_weights(motorcycle_origin_nb_queen, style = "W", allow_zero = TRUE)
motorcycle_origin_wm_queen <- motorcycle_origin %>%
mutate(nb = motorcycle_origin_nb_queen,
wt = motorcycle_origin_wt_queen,
.before = 1)
lisa_queen_origin_motorcycle <- motorcycle_origin_wm_queen %>%
mutate(local_moran = local_moran(num_of_trips, nb, wt,
zero.policy = TRUE, nsim = 99),
.before = 1) %>%
unnest(local_moran)motorcycle_dest_nb_queen <- st_contiguity(motorcycle_dest$geometry, queen = TRUE)
motorcycle_dest_wt_queen <- st_weights(motorcycle_dest_nb_queen, style = "W", allow_zero = TRUE)
motorcycle_dest_wm_queen <- motorcycle_dest %>%
mutate(nb = motorcycle_dest_nb_queen,
wt = motorcycle_dest_wt_queen,
.before = 1)
lisa_queen_dest_motorcycle <- motorcycle_dest_wm_queen %>%
mutate(local_moran = local_moran(num_of_trips, nb, wt,
zero.policy = TRUE, nsim = 99),
.before = 1) %>%
unnest(local_moran)motorcycle_origin_nb_rook <- st_contiguity(motorcycle_origin$geometry, queen=FALSE)
motorcycle_origin_wt_rook <- st_weights(motorcycle_origin_nb_rook, style = "W", allow_zero = TRUE)
motorcycle_origin_wm_rook <- motorcycle_origin %>%
mutate(nb = motorcycle_origin_nb_rook,
wt = motorcycle_origin_wt_rook,
.before = 1)
lisa_rook_origin_motorcycle <- motorcycle_origin_wm_rook %>%
mutate(local_moran = local_moran(num_of_trips, nb, wt,
zero.policy = TRUE, nsim = 99),
.before = 1) %>%
unnest(local_moran)motorcycle_dest_nb_rook <- st_contiguity(motorcycle_dest$geometry, queen = FALSE)
motorcycle_dest_wt_rook <- st_weights(motorcycle_dest_nb_rook, style = "W", allow_zero = TRUE)
motorcycle_dest_wm_rook <- motorcycle_dest %>%
mutate(nb = motorcycle_dest_nb_rook,
wt = motorcycle_dest_wt_rook,
.before = 1)
lisa_rook_dest_motorcycle <- motorcycle_dest_wm_rook %>%
mutate(local_moran = local_moran(num_of_trips, nb, wt,
zero.policy = TRUE, nsim = 99),
.before = 1) %>%
unnest(local_moran)Now, we can visualise how spatial clusterings of Grab trips form based on driving modes, namely by car and motorcycle. We will only be looking at statistically significant LISA values i.e. p-value < 0.05.
Code
lisa_queen_origin_car_sig <- lisa_queen_origin_car %>%
filter(p_ii_sim < 0.05)
lisa_queen_origin_motorcycle_sig <- lisa_queen_origin_motorcycle %>%
filter(p_ii_sim < 0.05)
lisa_cat_origin_car <- tm_shape(lisa_queen_origin_car) +
tm_polygons() +
tm_borders(col = "black", alpha = 0.6)+
tm_shape(lisa_queen_origin_car_sig) +
tm_polygons("mean",
# blue, orange, green, red
palette = c("lightblue1", "#ec9a64","green3", "#d21b1c"),
title = "LISA Classification",
midpoint = NA,
legend.hist = TRUE,
legend.is.portrait = TRUE,
legend.hist.z = 0.1) +
tm_layout(main.title = "LISA Spatial Autocorrelation of Grab Trip Origins in Jakarta by Car",
main.title.position = "center",
main.title.size = 1,
main.title.fontface = "bold",
legend.title.size = 0.8,
legend.text.size = 0.8,
legend.hist.size = 0.8,
legend.outside = TRUE,
legend.outside.position = "right",
frame = TRUE) +
tm_borders(alpha = 0.5) +
tm_compass(type = "8star", text.size = 1, size = 2, position = c("RIGHT", "TOP")) +
tm_scale_bar(position = c("LEFT", "BOTTOM"), text.size = 0.5) +
tm_grid(alpha = 0.1)
lisa_cat_origin_motorcycle <- tm_shape(lisa_queen_origin_motorcycle) +
tm_polygons() +
tm_borders(col = "black", alpha = 0.6)+
tm_shape(lisa_queen_origin_motorcycle_sig) +
tm_polygons("mean",
# blue, orange, green, red
palette = c("lightblue1", "#ec9a64","green3", "#d21b1c"),
title = "LISA Classification",
midpoint = NA,
legend.hist = TRUE,
legend.is.portrait = TRUE,
legend.hist.z = 0.1) +
tm_layout(main.title = "LISA Spatial Autocorrelation of Grab Trip Origins in Jakarta by Motorcycle",
main.title.position = "center",
main.title.size = 1,
main.title.fontface = "bold",
legend.title.size = 0.8,
legend.text.size = 0.8,
legend.hist.size = 0.8,
legend.outside = TRUE,
legend.outside.position = "right",
frame = TRUE) +
tm_borders(alpha = 0.5) +
tm_compass(type = "8star", text.size = 1, size = 2, position = c("RIGHT", "TOP")) +
tm_scale_bar(position = c("LEFT", "BOTTOM"), text.size = 0.5) +
tm_grid(alpha = 0.1)
tmap_arrange(lisa_cat_origin_car, lisa_cat_origin_motorcycle, asp=1, nrow=2)
Code
lisa_queen_dest_car_sig <- lisa_queen_dest_car %>%
filter(p_ii_sim < 0.05)
lisa_queen_dest_motorcycle_sig <- lisa_queen_dest_motorcycle %>%
filter(p_ii_sim < 0.05)
lisa_cat_dest_car <- tm_shape(lisa_queen_dest_car) +
tm_polygons() +
tm_borders(col = "black", alpha = 0.6)+
tm_shape(lisa_queen_dest_car_sig) +
tm_polygons("mean",
# blue, orange, green, red
palette = c("lightblue1", "#ec9a64","green3", "#d21b1c"),
title = "LISA Classification",
midpoint = NA,
legend.hist = TRUE,
legend.is.portrait = TRUE,
legend.hist.z = 0.1) +
tm_layout(main.title = "LISA Spatial Autocorrelation of Grab Trip Destinations in Jakarta by Car",
main.title.position = "center",
main.title.size = 1,
main.title.fontface = "bold",
legend.title.size = 0.8,
legend.text.size = 0.8,
legend.hist.size = 0.8,
legend.outside = TRUE,
legend.outside.position = "right",
frame = TRUE) +
tm_borders(alpha = 0.5) +
tm_compass(type = "8star", text.size = 1, size = 2, position = c("RIGHT", "TOP")) +
tm_scale_bar(position = c("LEFT", "BOTTOM"), text.size = 0.5) +
tm_grid(alpha = 0.1)
lisa_cat_dest_motorcycle <- tm_shape(lisa_queen_dest_motorcycle) +
tm_polygons() +
tm_borders(col = "black", alpha = 0.6)+
tm_shape(lisa_queen_dest_motorcycle_sig) +
tm_polygons("mean",
# blue, orange, green, red
palette = c("lightblue1", "#ec9a64","green3", "#d21b1c"),
title = "LISA Classification",
midpoint = NA,
legend.hist = TRUE,
legend.is.portrait = TRUE,
legend.hist.z = 0.1) +
tm_layout(main.title = "LISA Spatial Autocorrelation of Grab Trip Destinations in Jakarta by Motorcycle",
main.title.position = "center",
main.title.size = 1,
main.title.fontface = "bold",
legend.title.size = 0.8,
legend.text.size = 0.8,
legend.hist.size = 0.8,
legend.outside = TRUE,
legend.outside.position = "right",
frame = TRUE) +
tm_borders(alpha = 0.5) +
tm_compass(type = "8star", text.size = 1, size = 2, position = c("RIGHT", "TOP")) +
tm_scale_bar(position = c("LEFT", "BOTTOM"), text.size = 0.5) +
tm_grid(alpha = 0.1)
tmap_arrange(lisa_cat_dest_car, lisa_cat_dest_motorcycle, asp=1, nrow=2)
Code
lisa_rook_origin_car_sig <- lisa_rook_origin_car %>%
filter(p_ii_sim < 0.05)
lisa_rook_origin_motorcycle_sig <- lisa_rook_origin_motorcycle %>%
filter(p_ii_sim < 0.05)
lisa_cat_origin_car <- tm_shape(lisa_rook_origin_car) +
tm_polygons() +
tm_borders(col = "black", alpha = 0.6)+
tm_shape(lisa_rook_origin_car_sig) +
tm_polygons("mean",
# blue, orange, green, red
palette = c("lightblue1", "#ec9a64","green3", "#d21b1c"),
title = "LISA Classification",
midpoint = NA,
legend.hist = TRUE,
legend.is.portrait = TRUE,
legend.hist.z = 0.1) +
tm_layout(main.title = "LISA Spatial Autocorrelation of Grab Trip Origins in Jakarta by Car",
main.title.position = "center",
main.title.size = 1,
main.title.fontface = "bold",
legend.title.size = 0.8,
legend.text.size = 0.8,
legend.hist.size = 0.8,
legend.outside = TRUE,
legend.outside.position = "right",
frame = TRUE) +
tm_borders(alpha = 0.5) +
tm_compass(type = "8star", text.size = 1, size = 2, position = c("RIGHT", "TOP")) +
tm_scale_bar(position = c("LEFT", "BOTTOM"), text.size = 0.5) +
tm_grid(alpha = 0.1)
lisa_cat_origin_motorcycle <- tm_shape(lisa_rook_origin_motorcycle) +
tm_polygons() +
tm_borders(col = "black", alpha = 0.6)+
tm_shape(lisa_rook_origin_motorcycle_sig) +
tm_polygons("mean",
# blue, orange, green, red
palette = c("lightblue1", "#ec9a64","green3", "#d21b1c"),
title = "LISA Classification",
midpoint = NA,
legend.hist = TRUE,
legend.is.portrait = TRUE,
legend.hist.z = 0.1) +
tm_layout(main.title = "LISA Spatial Autocorrelation of Grab Trip Origins in Jakarta by Motorcycle",
main.title.position = "center",
main.title.size = 1,
main.title.fontface = "bold",
legend.title.size = 0.8,
legend.text.size = 0.8,
legend.hist.size = 0.8,
legend.outside = TRUE,
legend.outside.position = "right",
frame = TRUE) +
tm_borders(alpha = 0.5) +
tm_compass(type = "8star", text.size = 1, size = 2, position = c("RIGHT", "TOP")) +
tm_scale_bar(position = c("LEFT", "BOTTOM"), text.size = 0.5) +
tm_grid(alpha = 0.1)
tmap_arrange(lisa_cat_origin_car, lisa_cat_origin_motorcycle, asp=1, nrow=2)
Code
lisa_rook_dest_car_sig <- lisa_rook_dest_car %>%
filter(p_ii_sim < 0.05)
lisa_rook_dest_motorcycle_sig <- lisa_rook_dest_motorcycle %>%
filter(p_ii_sim < 0.05)
lisa_cat_dest_car <- tm_shape(lisa_rook_dest_car) +
tm_polygons() +
tm_borders(col = "black", alpha = 0.6)+
tm_shape(lisa_rook_dest_car_sig) +
tm_polygons("mean",
# blue, orange, green, red
palette = c("lightblue1", "#ec9a64","green3", "#d21b1c"),
title = "LISA Classification",
midpoint = NA,
legend.hist = TRUE,
legend.is.portrait = TRUE,
legend.hist.z = 0.1) +
tm_layout(main.title = "LISA Spatial Autocorrelation of Grab Trip Destinations in Jakarta by Car",
main.title.position = "center",
main.title.size = 1,
main.title.fontface = "bold",
legend.title.size = 0.8,
legend.text.size = 0.8,
legend.hist.size = 0.8,
legend.outside = TRUE,
legend.outside.position = "right",
frame = TRUE) +
tm_borders(alpha = 0.5) +
tm_compass(type = "8star", text.size = 1, size = 2, position = c("RIGHT", "TOP")) +
tm_scale_bar(position = c("LEFT", "BOTTOM"), text.size = 0.5) +
tm_grid(alpha = 0.1)
lisa_cat_dest_motorcycle <- tm_shape(lisa_rook_dest_motorcycle) +
tm_polygons() +
tm_borders(col = "black", alpha = 0.6)+
tm_shape(lisa_rook_dest_motorcycle_sig) +
tm_polygons("mean",
# blue, orange, green, red
palette = c("lightblue1", "#ec9a64","green3", "#d21b1c"),
title = "LISA Classification",
midpoint = NA,
legend.hist = TRUE,
legend.is.portrait = TRUE,
legend.hist.z = 0.1) +
tm_layout(main.title = "LISA Spatial Autocorrelation of Grab Trip Destinations in Jakarta by Motorcycle",
main.title.position = "center",
main.title.size = 1,
main.title.fontface = "bold",
legend.title.size = 0.8,
legend.text.size = 0.8,
legend.hist.size = 0.8,
legend.outside = TRUE,
legend.outside.position = "right",
frame = TRUE) +
tm_borders(alpha = 0.5) +
tm_compass(type = "8star", text.size = 1, size = 2, position = c("RIGHT", "TOP")) +
tm_scale_bar(position = c("LEFT", "BOTTOM"), text.size = 0.5) +
tm_grid(alpha = 0.1)
tmap_arrange(lisa_cat_dest_car, lisa_cat_dest_motorcycle, asp=1, nrow=2)
5.7.3 Visualising LISA of Jakarta Study Area by Weather (Rain / No Rain)
Next, we will analyse how the number of Grab trips in one district may influence clusters of Grab trips surrounding it. It is worth noting that we have only acquired weather data for trip origins since we are most interested in determining how the current weather influence commuters in deciding to book a Grab.
Rain clusters: Districts with low trip counts during rain reflect a lack of reliance on ride-hailing services, possibly due to insufficient demand or alternative transport options.
No-rain clusters: Areas with high trip counts in dry conditions indicate robust travel demand and may reflect socioeconomic activity or tourism, necessitating infrastructure support to accommodate high volumes.
Rain outliers: Districts with low trip counts in rainy conditions surrounded by high usage may indicate barriers to accessing ride-hailing services, suggesting opportunities for improving transport accessibility.
No-rain outliers: Areas with high trip volumes in dry conditions surrounded by low-use districts may indicate popularity as a destination, warranting improved connectivity and transport services.
trip_weather <- trips %>%
select(origin_district, origin_weather_description_category) %>%
filter(origin_weather_description_category != 'Outside of Jakarta') %>%
group_by(origin_district, origin_weather_description_category) %>%
summarise(num_of_trips = n(), .groups = 'drop') %>%
pivot_wider(names_from = origin_weather_description_category,
values_from = num_of_trips, values_fill = list(num_of_trips = 0)) %>%
rename(district = origin_district)
no_rain_origin <- trip_weather %>%
select(district, not_rain) %>%
rename(num_of_trips = not_rain) %>%
left_join(district_geom, by = 'district') %>%
st_as_sf()
rain_origin <- trip_weather %>%
select(district, rain) %>%
rename(num_of_trips = rain) %>%
left_join(district_geom, by = 'district') %>%
st_as_sf()
# Inspect
no_rain_originSimple feature collection with 44 features and 2 fields
Geometry type: GEOMETRY
Dimension: XY
Bounding box: xmin: 13472330 ymin: -3037137 xmax: 13602610 ymax: -2881872
Projected CRS: UCS-2000 / Ukraine TM zone 10
# A tibble: 44 × 3
district num_of_trips geometry
<chr> <int> <POLYGON [m]>
1 cakung 509 ((13577740 -2957434, 13577686 -2957585, 135780…
2 cempaka putih 398 ((13546309 -2954155, 13546032 -2954655, 135451…
3 cengkareng 799 ((13501268 -2903641, 13501646 -2904344, 135020…
4 cilandak 678 ((13493440 -2969564, 13493591 -2969657, 134936…
5 cilincing 215 ((13600325 -2936635, 13600614 -2937446, 136008…
6 cipayung 371 ((13537176 -3005377, 13537282 -3005431, 135372…
7 ciracas 465 ((13521824 -2996302, 13521925 -2996378, 135223…
8 danau sunter 4 ((13546437 -2939299, 13546343 -2939273, 135461…
9 danau sunter dll 31 ((13541670 -2937824, 13540217 -2937341, 135396…
10 duren sawit 739 ((13548180 -2972327, 13548729 -2972607, 135497…
# ℹ 34 more rows
Likewise, I compute Local Moran’s I down to the granularity of origin, queen/rook and rain/no rain For instance, queen - origin - rain will only focus on cars moving from the start of trip, in wet weather condition, using the queen contiguity method to locate its neighbours.
rain_origin_nb_queen <- st_contiguity(rain_origin$geometry, queen=TRUE)
rain_origin_wt_queen <- st_weights(rain_origin_nb_queen, style = "W", allow_zero = TRUE)
rain_origin_wm_queen <- rain_origin %>%
mutate(nb = rain_origin_nb_queen,
wt = rain_origin_wt_queen,
.before = 1)
lisa_queen_rain <- rain_origin_wm_queen %>%
mutate(local_moran = local_moran(num_of_trips, nb, wt,
zero.policy = TRUE, nsim = 99),
.before = 1) %>%
unnest(local_moran)rain_origin_nb_rook <- st_contiguity(rain_origin$geometry, queen=FALSE)
rain_origin_wt_rook <- st_weights(rain_origin_nb_rook, style = "W", allow_zero = TRUE)
rain_origin_wm_rook <- rain_origin %>%
mutate(nb = rain_origin_nb_rook,
wt = rain_origin_wt_rook,
.before = 1)
lisa_rook_rain <- rain_origin_wm_rook %>%
mutate(local_moran = local_moran(num_of_trips, nb, wt,
zero.policy = TRUE, nsim = 99),
.before = 1) %>%
unnest(local_moran)no_rain_origin_nb_queen <- st_contiguity(no_rain_origin$geometry, queen=TRUE)
no_rain_origin_wt_queen <- st_weights(no_rain_origin_nb_queen, style = "W", allow_zero = TRUE)
no_rain_origin_wm_queen <- no_rain_origin %>%
mutate(nb = no_rain_origin_nb_queen,
wt = no_rain_origin_wt_queen,
.before = 1)
lisa_queen_no_rain <- no_rain_origin_wm_queen %>%
mutate(local_moran = local_moran(num_of_trips, nb, wt,
zero.policy = TRUE, nsim = 99),
.before = 1) %>%
unnest(local_moran)no_rain_origin_nb_rook <- st_contiguity(no_rain_origin$geometry, queen=FALSE)
no_rain_origin_wt_rook <- st_weights(no_rain_origin_nb_rook, style = "W", allow_zero = TRUE)
no_rain_origin_wm_rook <- no_rain_origin %>%
mutate(nb = no_rain_origin_nb_rook,
wt = no_rain_origin_wt_rook,
.before = 1)
lisa_rook_no_rain <- no_rain_origin_wm_rook %>%
mutate(local_moran = local_moran(num_of_trips, nb, wt,
zero.policy = TRUE, nsim = 99),
.before = 1) %>%
unnest(local_moran)Now, let’s visualise how spatial clustering of Grab trips form based on weather (rain/no rain), namely by car and motorcycle. We will only be looking at statistically significant LISA values i.e. p-value < 0.05.
Code
lisa_queen_rain_sig <- lisa_queen_rain %>%
filter(p_ii_sim < 0.05)
lisa_queen_no_rain_sig <- lisa_queen_no_rain %>%
filter(p_ii_sim < 0.05)
lisa_cat_queen_rain <- tm_shape(lisa_queen_rain) +
tm_polygons() +
tm_borders(col = "black", alpha = 0.6)+
tm_shape(lisa_queen_rain_sig) +
tm_polygons("mean",
# blue, orange, green, red
palette = c("lightblue1", "#ec9a64","green3", "#d21b1c"),
title = "LISA Classification",
midpoint = NA,
legend.hist = TRUE,
legend.is.portrait = TRUE,
legend.hist.z = 0.1) +
tm_layout(main.title = "LISA Spatial Autocorrelation of Grab Trip Origins in Jakarta (Rain)",
main.title.position = "center",
main.title.size = 1,
main.title.fontface = "bold",
legend.title.size = 0.8,
legend.text.size = 0.8,
legend.hist.size = 0.8,
legend.outside = TRUE,
legend.outside.position = "right",
frame = TRUE) +
tm_borders(alpha = 0.5) +
tm_compass(type = "8star", text.size = 1, size = 2, position = c("RIGHT", "TOP")) +
tm_scale_bar(position = c("LEFT", "BOTTOM"), text.size = 0.5) +
tm_grid(alpha = 0.1)
lisa_cat_queen_no_rain <- tm_shape(lisa_queen_no_rain) +
tm_polygons() +
tm_borders(col = "black", alpha = 0.6)+
tm_shape(lisa_queen_no_rain_sig) +
tm_polygons("mean",
# blue, orange, green, red
palette = c("lightblue1", "#ec9a64","green3", "#d21b1c"),
title = "LISA Classification",
midpoint = NA,
legend.hist = TRUE,
legend.is.portrait = TRUE,
legend.hist.z = 0.1) +
tm_layout(main.title = "LISA Spatial Autocorrelation of Grab Trip Origins in Jakarta (No Rain)",
main.title.position = "center",
main.title.size = 1,
main.title.fontface = "bold",
legend.title.size = 0.8,
legend.text.size = 0.8,
legend.hist.size = 0.8,
legend.outside = TRUE,
legend.outside.position = "right",
frame = TRUE) +
tm_borders(alpha = 0.5) +
tm_compass(type = "8star", text.size = 1, size = 2, position = c("RIGHT", "TOP")) +
tm_scale_bar(position = c("LEFT", "BOTTOM"), text.size = 0.5) +
tm_grid(alpha = 0.1)
tmap_arrange(lisa_cat_queen_rain, lisa_cat_queen_no_rain, asp=1, nrow=2)
Code
lisa_rook_rain_sig <- lisa_rook_rain %>%
filter(p_ii_sim < 0.05)
lisa_rook_no_rain_sig <- lisa_rook_no_rain %>%
filter(p_ii_sim < 0.05)
lisa_cat_rook_rain <- tm_shape(lisa_rook_rain) +
tm_polygons() +
tm_borders(col = "black", alpha = 0.6) +
tm_shape(lisa_rook_rain_sig) +
tm_polygons("mean",
palette = c("lightblue1", "#ec9a64", "green3", "#d21b1c"),
title = "LISA Classification",
midpoint = NA,
legend.hist = TRUE,
legend.is.portrait = TRUE,
legend.hist.z = 0.1) +
tm_layout(main.title = "LISA Spatial Autocorrelation of Grab Trip Origins in Jakarta (Rain)",
main.title.position = "center",
main.title.size = 1,
main.title.fontface = "bold",
legend.title.size = 0.8,
legend.text.size = 0.8,
legend.hist.size = 0.8,
legend.outside = TRUE,
legend.outside.position = "right",
frame = TRUE) +
tm_borders(alpha = 0.5) +
tm_compass(type = "8star", text.size = 1, size = 2, position = c("RIGHT", "TOP")) +
tm_scale_bar(position = c("LEFT", "BOTTOM"), text.size = 0.5) +
tm_grid(alpha = 0.1)
lisa_cat_rook_no_rain <- tm_shape(lisa_rook_no_rain) +
tm_polygons() +
tm_borders(col = "black", alpha = 0.6) +
tm_shape(lisa_rook_no_rain_sig) +
tm_polygons("mean",
palette = c("lightblue1", "#ec9a64", "green3", "#d21b1c"),
title = "LISA Classification",
midpoint = NA,
legend.hist = TRUE,
legend.is.portrait = TRUE,
legend.hist.z = 0.1) +
tm_layout(main.title = "LISA Spatial Autocorrelation of Grab Trip Origins in Jakarta (No Rain)",
main.title.position = "center",
main.title.size = 1,
main.title.fontface = "bold",
legend.title.size = 0.8,
legend.text.size = 0.8,
legend.hist.size = 0.8,
legend.outside = TRUE,
legend.outside.position = "right",
frame = TRUE) +
tm_borders(alpha = 0.5) +
tm_compass(type = "8star", text.size = 1, size = 2, position = c("RIGHT", "TOP")) +
tm_scale_bar(position = c("LEFT", "BOTTOM"), text.size = 0.5) +
tm_grid(alpha = 0.1)
tmap_arrange(lisa_cat_rook_rain, lisa_cat_rook_no_rain, asp = 1, nrow = 2)
5.7.4 Visualising LISA of Jakarta Study Area by Number of POIs
Next, I will now perform LISA classification analysis on the number of Points of Interest (POIs) in each district. Analysing POIs with LISA can yield interesting spatial patterns.
High-High clusters might reveal regions with high concentrations of POIs, possibly indicating commercial or tourist hubs.
Low-Low clusters could indicate underdeveloped or residential areas.
High-Low and Low-High outliers could be of interest for urban planners or business analysts to identify potential areas for investment or improvement.
Let’s quickly compute Local Moran’s I down to the granularity of queen and rook methods to locate its neighbours. For this case, I will not be categorising the data by origin and destination since the distribution of POIs is uniform across districts, making it unnecessary to differentiate based on origin and destination.
# Add geometry to pois_num
pois_num_sf <- pois_num %>%
left_join(district_geom, by = 'district') %>%
st_as_sf()
# Inspect
pois_num_sfSimple feature collection with 44 features and 2 fields
Geometry type: GEOMETRY
Dimension: XY
Bounding box: xmin: 13472330 ymin: -3037137 xmax: 13602610 ymax: -2881872
Projected CRS: UCS-2000 / Ukraine TM zone 10
# A tibble: 44 × 3
district num_of_pois geometry
<chr> <int> <POLYGON [m]>
1 cakung 56 ((13577740 -2957434, 13577686 -2957585, 1357803…
2 cempaka putih 38 ((13546309 -2954155, 13546032 -2954655, 1354519…
3 cengkareng 298 ((13501268 -2903641, 13501646 -2904344, 1350205…
4 cilandak 242 ((13493440 -2969564, 13493591 -2969657, 1349367…
5 cilincing 119 ((13600325 -2936635, 13600614 -2937446, 1360082…
6 cipayung 56 ((13537176 -3005377, 13537282 -3005431, 1353726…
7 ciracas 126 ((13521824 -2996302, 13521925 -2996378, 1352232…
8 danau sunter 1 ((13546437 -2939299, 13546343 -2939273, 1354615…
9 danau sunter dll 3 ((13541670 -2937824, 13540217 -2937341, 1353960…
10 duren sawit 173 ((13548180 -2972327, 13548729 -2972607, 1354977…
# ℹ 34 more rows
pois_nb_queen <- st_contiguity(pois_num_sf$geometry, queen=TRUE)
pois_wt_queen <- st_weights(pois_nb_queen, style = "W", allow_zero = TRUE)
pois_wm_queen <- pois_num_sf %>%
mutate(nb = pois_nb_queen,
wt = pois_wt_queen,
.before = 1)
lisa_queen_pois <- pois_wm_queen %>%
mutate(local_moran = local_moran(num_of_pois, nb, wt,
zero.policy = TRUE, nsim = 99),
.before = 1) %>%
unnest(local_moran)pois_nb_rook <- st_contiguity(pois_num_sf$geometry, queen=FALSE)
pois_wt_rook <- st_weights(pois_nb_rook, style = "W", allow_zero = TRUE)
pois_wm_rook <- pois_num_sf %>%
mutate(nb = pois_nb_rook,
wt = pois_wt_rook,
.before = 1)
lisa_rook_pois <- pois_wm_rook %>%
mutate(local_moran = local_moran(num_of_pois, nb, wt,
zero.policy = TRUE, nsim = 99),
.before = 1) %>%
unnest(local_moran)Next, we can visualise the LISA categories associated to each district via plotting it on a choropleth map below.
Code
lisa_cat_queen_pois <- tm_shape(lisa_queen_pois)+
tm_borders(col = "black", alpha = 0.6)+
tm_polygons("mean",
palette = c("lightblue1", "#ec9a64","green3", "#d21b1c"),
title = "LISA Classification",
midpoint = NA,
legend.hist = TRUE,
legend.is.portrait = TRUE,
legend.hist.z = 0.1) +
tm_layout(main.title = "Overall LISA Map of Points of Interests in Jakarta",
main.title.position = "center",
main.title.size = 1,
main.title.fontface = "bold",
legend.title.size = 0.8,
legend.text.size = 0.8,
legend.hist.size = 0.8,
legend.outside = TRUE,
legend.outside.position = "right",
frame = TRUE) +
tm_borders(alpha = 0.5) +
tm_compass(type = "8star", text.size = 1, size = 2, position = c("RIGHT", "TOP")) +
tm_scale_bar(position = c("LEFT", "BOTTOM"), text.size = 0.5) +
tm_grid(alpha = 0.1)
tmap_arrange(lisa_cat_queen_pois, asp=1)
Code
lisa_cat_rook_pois <- tm_shape(lisa_rook_pois) +
tm_borders(col = "black", alpha = 0.6) +
tm_polygons("mean",
palette = c("lightblue1", "#ec9a64", "green3", "#d21b1c"),
title = "LISA Classification",
midpoint = NA,
legend.hist = TRUE,
legend.is.portrait = TRUE,
legend.hist.z = 0.1) +
tm_layout(main.title = "Overall LISA Map of Points of Interests in Jakarta",
main.title.position = "center",
main.title.size = 1,
main.title.fontface = "bold",
legend.title.size = 0.8,
legend.text.size = 0.8,
legend.hist.size = 0.8,
legend.outside = TRUE,
legend.outside.position = "right",
frame = TRUE) +
tm_borders(alpha = 0.5) +
tm_compass(type = "8star", text.size = 1, size = 2, position = c("RIGHT", "TOP")) +
tm_scale_bar(position = c("LEFT", "BOTTOM"), text.size = 0.5) +
tm_grid(alpha = 0.1)
tmap_arrange(lisa_cat_rook_pois, asp = 1)
We can filter the statistically significant LISA observations (p-value > 0.05) and plot them on a map as shown.
Code
lisa_queen_pois_sig <- lisa_queen_pois %>%
filter(p_ii_sim < 0.05)
lisa_cat_queen_pois_sig <- tm_shape(lisa_queen_pois)+
tm_polygons() +
tm_borders(col = "black", alpha = 0.6)+
tm_shape(lisa_queen_pois_sig) +
tm_polygons("mean",
palette = c("lightblue1", "#ec9a64","green3", "#d21b1c"),
title = "LISA Classification",
midpoint = NA,
legend.hist = TRUE,
legend.is.portrait = TRUE,
legend.hist.z = 0.1) +
tm_layout(main.title = "Statistically Significant LISA Map of Points of Interests in Jakarta",
main.title.position = "center",
main.title.size = 1,
main.title.fontface = "bold",
legend.title.size = 0.8,
legend.text.size = 0.8,
legend.hist.size = 0.8,
legend.outside = TRUE,
legend.outside.position = "right",
frame = TRUE) +
tm_borders(alpha = 0.5) +
tm_compass(type = "8star", text.size = 1, size = 2, position = c("RIGHT", "TOP")) +
tm_scale_bar(position = c("LEFT", "BOTTOM"), text.size = 0.5) +
tm_grid(alpha = 0.1)
tmap_arrange(lisa_cat_queen_pois_sig, asp=1)
Code
lisa_rook_pois_sig <- lisa_rook_pois %>%
filter(p_ii_sim < 0.05)
lisa_cat_rook_pois_sig <- tm_shape(lisa_rook_pois) +
tm_polygons() +
tm_borders(col = "black", alpha = 0.6) +
tm_shape(lisa_rook_pois_sig) +
tm_polygons("mean",
palette = c("lightblue1", "#ec9a64", "green3", "#d21b1c"),
title = "LISA Classification",
midpoint = NA,
legend.hist = TRUE,
legend.is.portrait = TRUE,
legend.hist.z = 0.1) +
tm_layout(main.title = "Statistically Significant LISA Map of Points of Interests in Jakarta",
main.title.position = "center",
main.title.size = 1,
main.title.fontface = "bold",
legend.title.size = 0.8,
legend.text.size = 0.8,
legend.hist.size = 0.8,
legend.outside = TRUE,
legend.outside.position = "right",
frame = TRUE) +
tm_borders(alpha = 0.5) +
tm_compass(type = "8star", text.size = 1, size = 2, position = c("RIGHT", "TOP")) +
tm_scale_bar(position = c("LEFT", "BOTTOM"), text.size = 0.5) +
tm_grid(alpha = 0.1)
tmap_arrange(lisa_cat_rook_pois_sig, asp = 1)
Let’s also visualise the interactive version of the both overall and statistically significant LISA plots for both the queen and rook contiguity method.
Code
tmap_mode("view")
# Filter significant districts
lisa_queen_pois <- lisa_queen_pois %>%
mutate(label = paste("District:", district, "| ", mean))
# First map: Overall LISA Spatial Autocorrelation
lisa_cat <- tm_shape(lisa_queen_pois) +
tm_polygons("mean",
palette = c("lightblue1", "#ec9a64", "green3", "#d21b1c"),
title = "Overall LISA Classification",
id = "label")
# Second map: Statistical Significant LISA Map
lisa_queen_pois_sig <- lisa_queen_pois_sig %>%
mutate(label = paste("District:", district, "| ", mean))
lisa_cat_sig <- tm_shape(lisa_queen_pois) +
tm_polygons(id = "") +
tm_borders(col = "black", alpha = 0.6) +
tm_shape(lisa_queen_pois_sig) +
tm_polygons("mean",
palette = c("lightblue1", "#ec9a64", "green3", "#d21b1c"),
title = "Significant LISA Classification",
id = "label")
tmap_arrange(lisa_cat, lisa_cat_sig, asp = 1, ncol = 2)Code
tmap_mode("plot")Code
tmap_mode("view")
# Filter significant districts
lisa_rook_pois <- lisa_rook_pois %>%
mutate(label = paste("District:", district, "| ", mean))
# First map: Overall LISA Spatial Autocorrelation
lisa_cat <- tm_shape(lisa_rook_pois) +
tm_polygons("mean",
palette = c("lightblue1", "#ec9a64", "green3", "#d21b1c"),
title = "Overall LISA Classification",
id = "label")
# Second map: Statistical Significant LISA Map
lisa_rook_pois_sig <- lisa_rook_pois_sig %>%
mutate(label = paste("District:", district, "| ", mean))
lisa_cat_sig <- tm_shape(lisa_rook_pois) +
tm_polygons(id = "") +
tm_borders(col = "black", alpha = 0.6) +
tm_shape(lisa_rook_pois_sig) +
tm_polygons("mean",
palette = c("lightblue1", "#ec9a64", "green3", "#d21b1c"),
title = "Significant LISA Classification",
id = "label")
tmap_arrange(lisa_cat, lisa_cat_sig, asp = 1, ncol = 2)Code
tmap_mode("plot")5.7.5 Visualising LISA of Jakarta Study Area by Number of Trips per POI
Plotting the LISA categories of the number of trips per point of interest (POI) for each district can help us understand spatial patterns in travel demand intensity relative to the availability of POIs. This approach reveals insights into:
High-High and Low-Low Clusters: Districts with either high or low numbers of trips per POI may indicate varying levels of demand per resource, which can be associated with underlying factors such as accessibility, infrastructure, or population density. High-High clusters, for example, could highlight high-demand areas with limited POIs, indicating potential needs for additional amenities or services.
High-Low and Low-High Outliers: These outliers can identify unique districts where the number of trips per POI differs significantly from surrounding areas. For instance, a Low-High outlier would signal a district where the trip count per POI is lower than in neighboring districts, possibly pointing to underutilized facilities or areas that don’t attract much traffic despite available resources.
# Create trips_poi_origin
trips_poi_origin <- trips %>%
filter(origin_district != "outside of jakarta") %>%
group_by(origin_district) %>%
summarise(num_of_trips = n()) %>%
rename(district = origin_district) %>%
left_join(pois_num, by = 'district') %>%
mutate(trips_per_poi = num_of_trips/num_of_pois) %>%
select(district, trips_per_poi) %>%
left_join(district_geom, by = 'district') %>%
st_as_sf()
head(trips_poi_origin)Simple feature collection with 6 features and 2 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 13475910 ymin: -3037137 xmax: 13602610 ymax: -2900532
Projected CRS: UCS-2000 / Ukraine TM zone 10
# A tibble: 6 × 3
district trips_per_poi geometry
<chr> <dbl> <POLYGON [m]>
1 cakung 11.2 ((13577740 -2957434, 13577686 -2957585, 13578035 …
2 cempaka putih 12.1 ((13546309 -2954155, 13546032 -2954655, 13545199 …
3 cengkareng 3.11 ((13501268 -2903641, 13501646 -2904344, 13502057 …
4 cilandak 3.43 ((13493440 -2969564, 13493591 -2969657, 13493675 …
5 cilincing 2.24 ((13600325 -2936635, 13600614 -2937446, 13600827 …
6 cipayung 8.59 ((13537176 -3005377, 13537282 -3005431, 13537264 …
# Create trips_poi_dest
trips_poi_dest <- trips %>%
filter(destination_district != "outside of jakarta") %>%
group_by(destination_district) %>%
summarise(num_of_trips = n()) %>%
rename(district = destination_district) %>%
left_join(pois_num, by = 'district') %>%
mutate(trips_per_poi = num_of_trips/num_of_pois) %>%
select(district, trips_per_poi) %>%
left_join(district_geom, by = 'district') %>%
st_as_sf()
head(trips_poi_dest)Simple feature collection with 6 features and 2 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 13475910 ymin: -3037137 xmax: 13602610 ymax: -2900532
Projected CRS: UCS-2000 / Ukraine TM zone 10
# A tibble: 6 × 3
district trips_per_poi geometry
<chr> <dbl> <POLYGON [m]>
1 cakung 15.1 ((13577740 -2957434, 13577686 -2957585, 13578035 …
2 cempaka putih 10.3 ((13546309 -2954155, 13546032 -2954655, 13545199 …
3 cengkareng 3.76 ((13501268 -2903641, 13501646 -2904344, 13502057 …
4 cilandak 3.74 ((13493440 -2969564, 13493591 -2969657, 13493675 …
5 cilincing 2.33 ((13600325 -2936635, 13600614 -2937446, 13600827 …
6 cipayung 9.86 ((13537176 -3005377, 13537282 -3005431, 13537264 …
Next, I will compute Local Moran’s I down to the granularity of origin/destination and queen/rook. For instance, queen - origin will only focus on trips at the point of origin, using the queen contiguity method to locate its neighbours.
trips_poi_origin_nb_queen <- st_contiguity(trips_poi_origin$geometry, queen=TRUE)
trips_poi_origin_wt_queen <- st_weights(trips_poi_origin_nb_queen, style = "W", allow_zero = TRUE)
trips_poi_origin_wm_queen <- trips_poi_origin %>%
mutate(nb = trips_poi_origin_nb_queen,
wt = trips_poi_origin_wt_queen,
.before = 1)
lisa_queen_origin_trips_poi <- trips_poi_origin_wm_queen %>%
mutate(local_moran = local_moran(trips_per_poi, nb, wt,
zero.policy = TRUE, nsim = 99),
.before = 1) %>%
unnest(local_moran)trips_poi_dest_nb_queen <- st_contiguity(trips_poi_dest$geometry, queen=TRUE)
trips_poi_dest_wt_queen <- st_weights(trips_poi_dest_nb_queen, style = "W", allow_zero = TRUE)
trips_poi_dest_wm_queen <- trips_poi_dest %>%
mutate(nb = trips_poi_dest_nb_queen,
wt = trips_poi_dest_wt_queen,
.before = 1)
lisa_queen_dest_trips_poi <- trips_poi_dest_wm_queen %>%
mutate(local_moran = local_moran(trips_per_poi, nb, wt,
zero.policy = TRUE, nsim = 99),
.before = 1) %>%
unnest(local_moran)trips_poi_origin_nb_rook <- st_contiguity(trips_poi_origin$geometry, queen=FALSE)
trips_poi_origin_wt_rook <- st_weights(trips_poi_origin_nb_rook, style = "W", allow_zero = TRUE)
trips_poi_origin_wm_rook <- trips_poi_origin %>%
mutate(nb = trips_poi_origin_nb_rook,
wt = trips_poi_origin_wt_rook,
.before = 1)
lisa_rook_origin_trips_poi <- trips_poi_origin_wm_rook %>%
mutate(local_moran = local_moran(trips_per_poi, nb, wt,
zero.policy = TRUE, nsim = 99),
.before = 1) %>%
unnest(local_moran)trips_poi_dest_nb_rook <- st_contiguity(trips_poi_dest$geometry, queen=FALSE)
trips_poi_dest_wt_rook <- st_weights(trips_poi_dest_nb_rook, style = "W", allow_zero = TRUE)
trips_poi_dest_wm_rook <- trips_poi_dest %>%
mutate(nb = trips_poi_dest_nb_rook,
wt = trips_poi_dest_wt_rook,
.before = 1)
lisa_rook_dest_trips_poi <- trips_poi_dest_wm_rook %>%
mutate(local_moran = local_moran(trips_per_poi, nb, wt,
zero.policy = TRUE, nsim = 99),
.before = 1) %>%
unnest(local_moran)Code
lisa_cat_origin <- tm_shape(lisa_queen_origin_trips_poi) +
tm_polygons("mean",
# blue, orange, green, red
palette = c("lightblue1", "#ec9a64","green3", "#d21b1c"),
title = "LISA Classification",
midpoint = NA,
legend.hist = TRUE,
legend.is.portrait = TRUE,
legend.hist.z = 0.1) +
tm_layout(main.title = "Overall LISA Spatial Autocorrelation of Number of Trips Per POI for Trip Origins",
main.title.position = "center",
main.title.size = 1,
main.title.fontface = "bold",
legend.title.size = 0.8,
legend.text.size = 0.8,
legend.hist.size = 0.8,
legend.outside = TRUE,
legend.outside.position = "right",
frame = TRUE) +
tm_borders(alpha = 0.5) +
tm_compass(type = "8star", text.size = 1, size = 2, position = c("RIGHT", "TOP")) +
tm_scale_bar(position = c("LEFT", "BOTTOM"), text.size = 0.5) +
tm_grid(alpha = 0.1)
lisa_cat_dest <- tm_shape(lisa_queen_dest_trips_poi) +
tm_polygons("mean",
# blue, orange, green, red
palette = c("lightblue1", "#ec9a64","green3", "#d21b1c"),
title = "LISA Classification",
midpoint = NA,
legend.hist = TRUE,
legend.is.portrait = TRUE,
legend.hist.z = 0.1) +
tm_layout(main.title = "Overall LISA Spatial Autocorrelation of Number of Trips Per POI for Trip Destinations",
main.title.position = "center",
main.title.size = 1,
main.title.fontface = "bold",
legend.title.size = 0.8,
legend.text.size = 0.8,
legend.hist.size = 0.8,
legend.outside = TRUE,
legend.outside.position = "right",
frame = TRUE) +
tm_borders(alpha = 0.5) +
tm_compass(type = "8star", text.size = 1, size = 2, position = c("RIGHT", "TOP")) +
tm_scale_bar(position = c("LEFT", "BOTTOM"), text.size = 0.5) +
tm_grid(alpha = 0.1)
tmap_arrange(lisa_cat_origin, lisa_cat_dest, asp=1, nrow=2)
Code
lisa_cat_origin <- tm_shape(lisa_rook_origin_trips_poi) +
tm_polygons("mean",
palette = c("lightblue1", "#ec9a64","green3", "#d21b1c"),
title = "LISA Classification",
midpoint = NA,
legend.hist = TRUE,
legend.is.portrait = TRUE,
legend.hist.z = 0.1) +
tm_layout(main.title = "Overall LISA Spatial Autocorrelation Number of Trips Per POI for Trip Origins",
main.title.position = "center",
main.title.size = 1,
main.title.fontface = "bold",
legend.title.size = 0.8,
legend.text.size = 0.8,
legend.hist.size = 0.8,
legend.outside = TRUE,
legend.outside.position = "right",
frame = TRUE) +
tm_borders(alpha = 0.5) +
tm_compass(type = "8star", text.size = 1, size = 2, position = c("RIGHT", "TOP")) +
tm_scale_bar(position = c("LEFT", "BOTTOM"), text.size = 0.5) +
tm_grid(alpha = 0.1)
lisa_cat_dest <- tm_shape(lisa_rook_dest_trips_poi) +
tm_polygons("mean",
palette = c("lightblue1", "#ec9a64","green3", "#d21b1c"),
title = "LISA Classification",
midpoint = NA,
legend.hist = TRUE,
legend.is.portrait = TRUE,
legend.hist.z = 0.1) +
tm_layout(main.title = "Overall LISA Spatial Autocorrelation Number of Trips Per POI for Trip Destinations",
main.title.position = "center",
main.title.size = 1,
main.title.fontface = "bold",
legend.title.size = 0.8,
legend.text.size = 0.8,
legend.hist.size = 0.8,
legend.outside = TRUE,
legend.outside.position = "right",
frame = TRUE) +
tm_borders(alpha = 0.5) +
tm_compass(type = "8star", text.size = 1, size = 2, position = c("RIGHT", "TOP")) +
tm_scale_bar(position = c("LEFT", "BOTTOM"), text.size = 0.5) +
tm_grid(alpha = 0.1)
tmap_arrange(lisa_cat_origin, lisa_cat_dest, asp=1, nrow=2)
We can also visualise all statistically significant LISA categories where p-value < 0.05 for both origin and destination trips.
Code
lisa_queen_origin_sig <- lisa_queen_origin_trips_poi %>%
filter(p_ii_sim < 0.05)
lisa_cat_origin_sig <- tm_shape(lisa_queen_origin_trips_poi)+
tm_polygons() +
tm_borders(col = "black", alpha = 0.6)+
tm_shape(lisa_queen_origin_sig) +
tm_polygons("mean",
palette = c("lightblue1", "#ec9a64","green3", "#d21b1c"),
title = "LISA Classification",
midpoint = NA,
legend.hist = TRUE,
legend.is.portrait = TRUE,
legend.hist.z = 0.1) +
tm_layout(main.title = "Statistically Significant LISA Map of Number of Trips Per POI for Trip Origins",
main.title.position = "center",
main.title.size = 1,
main.title.fontface = "bold",
legend.title.size = 0.8,
legend.text.size = 0.8,
legend.hist.size = 0.8,
legend.outside = TRUE,
legend.outside.position = "right",
frame = TRUE) +
tm_borders(alpha = 0.5) +
tm_compass(type = "8star", text.size = 1, size = 2, position = c("RIGHT", "TOP")) +
tm_scale_bar(position = c("LEFT", "BOTTOM"), text.size = 0.5) +
tm_grid(alpha = 0.1)
lisa_queen_dest_sig <- lisa_queen_dest_trips_poi %>%
filter(p_ii_sim < 0.05)
lisa_cat_dest_sig <- tm_shape(lisa_queen_dest_trips_poi)+
tm_polygons() +
tm_borders(col = "black", alpha = 0.6)+
tm_shape(lisa_queen_dest_sig) +
tm_polygons("mean",
palette = c("lightblue1", "#ec9a64","green3", "#d21b1c"),
title = "LISA Classification",
midpoint = NA,
legend.hist = TRUE,
legend.is.portrait = TRUE,
legend.hist.z = 0.1) +
tm_layout(main.title = "Statistically Significant LISA Map of Number of Trips Per POI for Trip Destinations",
main.title.position = "center",
main.title.size = 1,
main.title.fontface = "bold",
legend.title.size = 0.8,
legend.text.size = 0.8,
legend.hist.size = 0.8,
legend.outside = TRUE,
legend.outside.position = "right",
frame = TRUE) +
tm_borders(alpha = 0.5) +
tm_compass(type = "8star", text.size = 1, size = 2, position = c("RIGHT", "TOP")) +
tm_scale_bar(position = c("LEFT", "BOTTOM"), text.size = 0.5) +
tm_grid(alpha = 0.1)
tmap_arrange(lisa_cat_origin_sig, lisa_cat_dest_sig, asp=1, nrow=2)
Code
lisa_rook_origin_sig <- lisa_rook_origin_trips_poi %>%
filter(p_ii_sim < 0.05)
lisa_cat_origin_sig <- tm_shape(lisa_rook_origin_trips_poi) +
tm_polygons() +
tm_borders(col = "black", alpha = 0.6) +
tm_shape(lisa_rook_origin_sig) +
tm_polygons("mean",
palette = c("lightblue1", "#ec9a64","green3", "#d21b1c"),
title = "LISA Classification",
midpoint = NA,
legend.hist = TRUE,
legend.is.portrait = TRUE,
legend.hist.z = 0.1) +
tm_layout(main.title = "Statistically Significant LISA Map of Number of Trips Per POI for Trip Origins",
main.title.position = "center",
main.title.size = 1,
main.title.fontface = "bold",
legend.title.size = 0.8,
legend.text.size = 0.8,
legend.hist.size = 0.8,
legend.outside = TRUE,
legend.outside.position = "right",
frame = TRUE) +
tm_borders(alpha = 0.5) +
tm_compass(type = "8star", text.size = 1, size = 2, position = c("RIGHT", "TOP")) +
tm_scale_bar(position = c("LEFT", "BOTTOM"), text.size = 0.5) +
tm_grid(alpha = 0.1)
lisa_rook_dest_sig <- lisa_rook_dest_trips_poi %>%
filter(p_ii_sim < 0.05)
lisa_cat_dest_sig <- tm_shape(lisa_rook_dest_trips_poi) +
tm_polygons() +
tm_borders(col = "black", alpha = 0.6) +
tm_shape(lisa_rook_dest_sig) +
tm_polygons("mean",
palette = c("lightblue1", "#ec9a64","green3", "#d21b1c"),
title = "LISA Classification",
midpoint = NA,
legend.hist = TRUE,
legend.is.portrait = TRUE,
legend.hist.z = 0.1) +
tm_layout(main.title = "Statistically Significant LISA Map of Number of Trips Per POI for Trip Destinations",
main.title.position = "center",
main.title.size = 1,
main.title.fontface = "bold",
legend.title.size = 0.8,
legend.text.size = 0.8,
legend.hist.size = 0.8,
legend.outside = TRUE,
legend.outside.position = "right",
frame = TRUE) +
tm_borders(alpha = 0.5) +
tm_compass(type = "8star", text.size = 1, size = 2, position = c("RIGHT", "TOP")) +
tm_scale_bar(position = c("LEFT", "BOTTOM"), text.size = 0.5) +
tm_grid(alpha = 0.1)
tmap_arrange(lisa_cat_origin_sig, lisa_cat_dest_sig, asp=1, nrow=2)
Here’s the interactive version of the static plots above for more readability using tmap_mode('view'). Each district, when hovered above, shows the district name and LISA category, unless it is not statistically significant then it will just show the district name.
tmap_mode("view")tmap mode set to interactive viewing
Code
# First map: Overall LISA Spatial Autocorrelation
lisa_queen_origin <- lisa_queen_origin_trips_poi %>%
mutate(label = paste("District:", district, "| ", mean))
lisa_cat <- tm_shape(lisa_queen_origin) +
tm_polygons("mean",
palette = c("lightblue1", "#ec9a64", "green3", "#d21b1c"),
title = "Overall LISA Classification",
id = "label")
# Second map: Statistical Significant LISA Map
lisa_queen_origin_sig <- lisa_queen_origin_trips_poi %>%
filter(p_ii_sim < 0.05) %>%
mutate(label = paste("District:", district, "| ", mean))
lisa_cat_sig <- tm_shape(lisa_queen_origin) +
tm_polygons(id = "") +
tm_borders(col = "black", alpha = 0.6) +
tm_shape(lisa_queen_origin_sig) +
tm_polygons("mean",
palette = c("lightblue1", "#ec9a64", "green3", "#d21b1c"),
title = "Significant LISA Classification",
id = "label")
tmap_arrange(lisa_cat, lisa_cat_sig, asp = 1, ncol = 2)Code
tmap_mode("plot")tmap_mode("view")tmap mode set to interactive viewing
Code
# First map: Overall LISA Spatial Autocorrelation
lisa_queen_dest <- lisa_queen_dest_trips_poi %>%
mutate(label = paste("District:", district, "| ", mean))
lisa_cat <- tm_shape(lisa_queen_dest) +
tm_polygons("mean",
palette = c("lightblue1", "#ec9a64", "green3", "#d21b1c"),
title = "Overall LISA Classification",
id = "label")
# Second map: Statistical Significant LISA Map
lisa_queen_dest_sig <- lisa_queen_dest_trips_poi %>%
filter(p_ii_sim < 0.05) %>%
mutate(label = paste("District:", district, "| ", mean))
lisa_cat_sig <- tm_shape(lisa_queen_dest) +
tm_polygons(id = "") +
tm_borders(col = "black", alpha = 0.6) +
tm_shape(lisa_queen_dest_sig) +
tm_polygons("mean",
palette = c("lightblue1", "#ec9a64", "green3", "#d21b1c"),
title = "Significant LISA Classification",
id = "label")
tmap_arrange(lisa_cat, lisa_cat_sig, asp = 1, ncol = 2)Code
tmap_mode("plot")tmap_mode("view")tmap mode set to interactive viewing
Code
# First map: Overall LISA Spatial Autocorrelation
lisa_rook_origin <- lisa_rook_origin_trips_poi %>%
mutate(label = paste("District:", district, "| ", mean))
lisa_cat <- tm_shape(lisa_rook_origin) +
tm_polygons("mean",
palette = c("lightblue1", "#ec9a64", "green3", "#d21b1c"),
title = "Overall LISA Classification",
id = "label")
# Second map: Statistical Significant LISA Map
lisa_rook_origin_sig <- lisa_rook_origin_trips_poi %>%
filter(p_ii_sim < 0.05) %>%
mutate(label = paste("District:", district, "| ", mean))
lisa_cat_sig <- tm_shape(lisa_rook_origin) +
tm_polygons(id = "") +
tm_borders(col = "black", alpha = 0.6) +
tm_shape(lisa_rook_origin_sig) +
tm_polygons("mean",
palette = c("lightblue1", "#ec9a64", "green3", "#d21b1c"),
title = "Significant LISA Classification",
id = "label")
tmap_arrange(lisa_cat, lisa_cat_sig, asp = 1, ncol = 2)Code
tmap_mode("plot")tmap_mode("view")tmap mode set to interactive viewing
Code
# First map: Overall LISA Spatial Autocorrelation
lisa_rook_dest <- lisa_rook_dest_trips_poi %>%
mutate(label = paste("District:", district, "| ", mean))
lisa_cat <- tm_shape(lisa_rook_dest) +
tm_polygons("mean",
palette = c("lightblue1", "#ec9a64", "green3", "#d21b1c"),
title = "Overall LISA Classification",
id = "label")
# Second map: Statistical Significant LISA Map
lisa_rook_dest_sig <- lisa_rook_dest_sig %>%
filter(p_ii_sim < 0.05) %>%
mutate(label = paste("District:", district, "| ", mean))
lisa_cat_sig <- tm_shape(lisa_rook_dest) +
tm_polygons(id = "") +
tm_borders(col = "black", alpha = 0.6) +
tm_shape(lisa_rook_dest_sig) +
tm_polygons("mean",
palette = c("lightblue1", "#ec9a64", "green3", "#d21b1c"),
title = "Significant LISA Classification",
id = "label")
tmap_arrange(lisa_cat, lisa_cat_sig, asp = 1, ncol = 2)Code
tmap_mode("plot")5.7.6 Visualising LISA of Jakarta Study Area by Number of Trips per Capita
Using LISA categories based on the number of trips per capita for each district can be particularly insightful, as it accounts for the population size and provides a clearer picture of travel demand relative to the resident base.
High-High and Low-Low Clusters: Districts with high trips per capita (High-High clusters) could indicate areas where residents are especially reliant on Grab transportation services, potentially due to factors like limited local amenities, employment patterns, or population density. Low-Low clusters, on the other hand, might represent districts with lower overall travel needs or alternative forms of local transportation and amenities.
High-Low and Low-High Outliers: Outliers can reveal unique spatial patterns; for example, a Low-High outlier (low trips per capita in a high-demand neighboring area) might suggest that residents are less dependent on transportation services, potentially due to robust local infrastructure. Conversely, a High-Low outlier may indicate a district with high per capita trip demand in an otherwise low-demand region, possibly due to the district being a main city with more amenities.
Take note that there are two districts, namely Danau Sunter Dll and Danua Sunter that have 0 population size as they are lakes found in Jakarta. Hence, I will exclude them from the LISA classifications since we will end up with an infinite value for its number of trips per capita.
I will use the population dataframe previously imported into the project which contains the population size per district, i.e. population_count.
# Create trips_capita_origin
trips_capita_origin <- trips %>%
filter(origin_district != "outside of jakarta") %>%
group_by(origin_district) %>%
summarise(num_of_trips = n()) %>%
rename(district = origin_district) %>%
left_join(population, by = 'district') %>%
mutate(trips_per_capita = num_of_trips/population_count) %>%
select(district, trips_per_capita) %>%
filter(district != 'danau sunter dll' & district != 'danau sunter') %>%
left_join(district_geom, by = 'district') %>%
st_as_sf()
trips_capita_originSimple feature collection with 42 features and 2 fields
Geometry type: GEOMETRY
Dimension: XY
Bounding box: xmin: 13472330 ymin: -3037137 xmax: 13602610 ymax: -2881872
Projected CRS: UCS-2000 / Ukraine TM zone 10
# A tibble: 42 × 3
district trips_per_capita geometry
<chr> <dbl> <POLYGON [m]>
1 cakung 0.0162 ((13577740 -2957434, 13577686 -2957585, 1…
2 cempaka putih 0.0618 ((13546309 -2954155, 13546032 -2954655, 1…
3 cengkareng 0.0229 ((13501268 -2903641, 13501646 -2904344, 1…
4 cilandak 0.0492 ((13493440 -2969564, 13493591 -2969657, 1…
5 cilincing 0.00710 ((13600325 -2936635, 13600614 -2937446, 1…
6 cipayung 0.0225 ((13537176 -3005377, 13537282 -3005431, 1…
7 ciracas 0.0202 ((13521824 -2996302, 13521925 -2996378, 1…
8 duren sawit 0.0234 ((13548180 -2972327, 13548729 -2972607, 1…
9 gambir 0.295 ((13520352 -2930706, 13520451 -2930771, 1…
10 grogol petamburan 0.196 ((13518571 -2932148, 13518660 -2933219, 1…
# ℹ 32 more rows
# Create trips_capita_dest
trips_capita_dest <- trips %>%
filter(destination_district != "outside of jakarta") %>%
group_by(destination_district) %>%
summarise(num_of_trips = n()) %>%
rename(district = destination_district) %>%
left_join(population, by = 'district') %>%
mutate(trips_per_capita = num_of_trips/population_count) %>%
select(district, trips_per_capita) %>%
filter(district != 'danau sunter dll' & district != 'danau sunter') %>%
left_join(district_geom, by = 'district') %>%
st_as_sf()
trips_capita_destSimple feature collection with 42 features and 2 fields
Geometry type: GEOMETRY
Dimension: XY
Bounding box: xmin: 13472330 ymin: -3037137 xmax: 13602610 ymax: -2881872
Projected CRS: UCS-2000 / Ukraine TM zone 10
# A tibble: 42 × 3
district trips_per_capita geometry
<chr> <dbl> <POLYGON [m]>
1 cakung 0.0218 ((13577740 -2957434, 13577686 -2957585, 1…
2 cempaka putih 0.0527 ((13546309 -2954155, 13546032 -2954655, 1…
3 cengkareng 0.0277 ((13501268 -2903641, 13501646 -2904344, 1…
4 cilandak 0.0536 ((13493440 -2969564, 13493591 -2969657, 1…
5 cilincing 0.00739 ((13600325 -2936635, 13600614 -2937446, 1…
6 cipayung 0.0259 ((13537176 -3005377, 13537282 -3005431, 1…
7 ciracas 0.0182 ((13521824 -2996302, 13521925 -2996378, 1…
8 duren sawit 0.0230 ((13548180 -2972327, 13548729 -2972607, 1…
9 gambir 0.343 ((13520352 -2930706, 13520451 -2930771, 1…
10 grogol petamburan 0.150 ((13518571 -2932148, 13518660 -2933219, 1…
# ℹ 32 more rows
Next, I will compute Local Moran’s I down to the granularity of origin/destination and queen/rook. For instance, queen - origin will only focus on trips at the point of origin, using the queen contiguity method to locate its neighbours.
trips_capita_origin_nb_queen <- st_contiguity(trips_capita_origin$geometry, queen=TRUE)
trips_capita_origin_wt_queen <- st_weights(trips_capita_origin_nb_queen, style = "W", allow_zero = TRUE)
trips_capita_origin_wm_queen <- trips_capita_origin %>%
mutate(nb = trips_capita_origin_nb_queen,
wt = trips_capita_origin_wt_queen,
.before = 1)
lisa_queen_origin_trips_capita <- trips_capita_origin_wm_queen %>%
mutate(local_moran = local_moran(trips_per_capita, nb, wt,
zero.policy = TRUE, nsim = 99),
.before = 1) %>%
unnest(local_moran)trips_capita_dest_nb_queen <- st_contiguity(trips_capita_dest$geometry, queen=TRUE)
trips_capita_dest_wt_queen <- st_weights(trips_capita_dest_nb_queen, style = "W", allow_zero = TRUE)
trips_capita_dest_wm_queen <- trips_capita_dest %>%
mutate(nb = trips_capita_dest_nb_queen,
wt = trips_capita_dest_wt_queen,
.before = 1)
lisa_queen_dest_trips_capita <- trips_capita_dest_wm_queen %>%
mutate(local_moran = local_moran(trips_per_capita, nb, wt,
zero.policy = TRUE, nsim = 99),
.before = 1) %>%
unnest(local_moran)trips_capita_origin_nb_rook <- st_contiguity(trips_capita_origin$geometry, queen=FALSE)
trips_capita_origin_wt_rook <- st_weights(trips_capita_origin_nb_rook, style = "W", allow_zero = TRUE)
trips_capita_origin_wm_rook <- trips_capita_origin %>%
mutate(nb = trips_capita_origin_nb_rook,
wt = trips_capita_origin_wt_rook,
.before = 1)
lisa_rook_origin_trips_capita <- trips_capita_origin_wm_rook %>%
mutate(local_moran = local_moran(trips_per_capita, nb, wt,
zero.policy = TRUE, nsim = 99),
.before = 1) %>%
unnest(local_moran)trips_capita_dest_nb_rook <- st_contiguity(trips_capita_dest$geometry, queen=FALSE)
trips_capita_dest_wt_rook <- st_weights(trips_capita_dest_nb_rook, style = "W", allow_zero = TRUE)
trips_capita_dest_wm_rook <- trips_capita_dest %>%
mutate(nb = trips_capita_dest_nb_rook,
wt = trips_capita_dest_wt_rook,
.before = 1)
lisa_rook_dest_trips_capita <- trips_capita_dest_wm_rook %>%
mutate(local_moran = local_moran(trips_per_capita, nb, wt,
zero.policy = TRUE, nsim = 99),
.before = 1) %>%
unnest(local_moran)Code
lisa_cat_origin <- tm_shape(lisa_queen_origin_trips_capita) +
tm_polygons("mean",
# blue, orange, green, red
palette = c("lightblue1", "#ec9a64","green3", "#d21b1c"),
title = "LISA Classification",
midpoint = NA,
legend.hist = TRUE,
legend.is.portrait = TRUE,
legend.hist.z = 0.1) +
tm_layout(main.title = "Overall LISA Spatial Autocorrelation of Number of Trips Per Capita for Trip Origins",
main.title.position = "center",
main.title.size = 1,
main.title.fontface = "bold",
legend.title.size = 0.8,
legend.text.size = 0.8,
legend.hist.size = 0.8,
legend.outside = TRUE,
legend.outside.position = "right",
frame = TRUE) +
tm_borders(alpha = 0.5) +
tm_compass(type = "8star", text.size = 1, size = 2, position = c("RIGHT", "TOP")) +
tm_scale_bar(position = c("LEFT", "BOTTOM"), text.size = 0.5) +
tm_grid(alpha = 0.1)
lisa_cat_dest <- tm_shape(lisa_queen_dest_trips_capita) +
tm_polygons("mean",
# blue, orange, green, red
palette = c("lightblue1", "#ec9a64","green3", "#d21b1c"),
title = "LISA Classification",
midpoint = NA,
legend.hist = TRUE,
legend.is.portrait = TRUE,
legend.hist.z = 0.1) +
tm_layout(main.title = "Overall LISA Spatial Autocorrelation of Number of Trips Per Capita for Trip Destinations",
main.title.position = "center",
main.title.size = 1,
main.title.fontface = "bold",
legend.title.size = 0.8,
legend.text.size = 0.8,
legend.hist.size = 0.8,
legend.outside = TRUE,
legend.outside.position = "right",
frame = TRUE) +
tm_borders(alpha = 0.5) +
tm_compass(type = "8star", text.size = 1, size = 2, position = c("RIGHT", "TOP")) +
tm_scale_bar(position = c("LEFT", "BOTTOM"), text.size = 0.5) +
tm_grid(alpha = 0.1)
tmap_arrange(lisa_cat_origin, lisa_cat_dest, asp=1, nrow=2)
Code
lisa_cat_origin <- tm_shape(lisa_rook_origin_trips_capita) +
tm_polygons("mean",
palette = c("lightblue1", "#ec9a64","green3", "#d21b1c"),
title = "LISA Classification",
midpoint = NA,
legend.hist = TRUE,
legend.is.portrait = TRUE,
legend.hist.z = 0.1) +
tm_layout(main.title = "Overall LISA Spatial Autocorrelation Number of Trips Per Capita for Trip Origins",
main.title.position = "center",
main.title.size = 1,
main.title.fontface = "bold",
legend.title.size = 0.8,
legend.text.size = 0.8,
legend.hist.size = 0.8,
legend.outside = TRUE,
legend.outside.position = "right",
frame = TRUE) +
tm_borders(alpha = 0.5) +
tm_compass(type = "8star", text.size = 1, size = 2, position = c("RIGHT", "TOP")) +
tm_scale_bar(position = c("LEFT", "BOTTOM"), text.size = 0.5) +
tm_grid(alpha = 0.1)
lisa_cat_dest <- tm_shape(lisa_rook_dest_trips_capita) +
tm_polygons("mean",
palette = c("lightblue1", "#ec9a64","green3", "#d21b1c"),
title = "LISA Classification",
midpoint = NA,
legend.hist = TRUE,
legend.is.portrait = TRUE,
legend.hist.z = 0.1) +
tm_layout(main.title = "Overall LISA Spatial Autocorrelation Number of Trips Per Capita for Trip Destinations",
main.title.position = "center",
main.title.size = 1,
main.title.fontface = "bold",
legend.title.size = 0.8,
legend.text.size = 0.8,
legend.hist.size = 0.8,
legend.outside = TRUE,
legend.outside.position = "right",
frame = TRUE) +
tm_borders(alpha = 0.5) +
tm_compass(type = "8star", text.size = 1, size = 2, position = c("RIGHT", "TOP")) +
tm_scale_bar(position = c("LEFT", "BOTTOM"), text.size = 0.5) +
tm_grid(alpha = 0.1)
tmap_arrange(lisa_cat_origin, lisa_cat_dest, asp=1, nrow=2)
We can also visualise all statistically significant LISA categories where p-value < 0.05 for both trip origins and destinations.
Code
lisa_queen_origin_sig <- lisa_queen_origin_trips_capita %>%
filter(p_ii_sim < 0.05)
lisa_cat_origin_sig <- tm_shape(lisa_queen_origin_trips_capita)+
tm_polygons() +
tm_borders(col = "black", alpha = 0.6)+
tm_shape(lisa_queen_origin_sig) +
tm_polygons("mean",
palette = c("lightblue1", "#ec9a64","green3", "#d21b1c"),
title = "LISA Classification",
midpoint = NA,
legend.hist = TRUE,
legend.is.portrait = TRUE,
legend.hist.z = 0.1) +
tm_layout(main.title = "Statistically Signifcant LISA Map of Number of Trips Per Capita for Trip Origins",
main.title.position = "center",
main.title.size = 1,
main.title.fontface = "bold",
legend.title.size = 0.8,
legend.text.size = 0.8,
legend.hist.size = 0.8,
legend.outside = TRUE,
legend.outside.position = "right",
frame = TRUE) +
tm_borders(alpha = 0.5) +
tm_compass(type = "8star", text.size = 1, size = 2, position = c("RIGHT", "TOP")) +
tm_scale_bar(position = c("LEFT", "BOTTOM"), text.size = 0.5) +
tm_grid(alpha = 0.1)
lisa_queen_dest_sig <- lisa_queen_dest_trips_poi %>%
filter(p_ii_sim < 0.05)
lisa_cat_dest_sig <- tm_shape(lisa_queen_dest_trips_capita)+
tm_polygons() +
tm_borders(col = "black", alpha = 0.6)+
tm_shape(lisa_queen_dest_sig) +
tm_polygons("mean",
palette = c("lightblue1", "#ec9a64","green3", "#d21b1c"),
title = "LISA Classification",
midpoint = NA,
legend.hist = TRUE,
legend.is.portrait = TRUE,
legend.hist.z = 0.1) +
tm_layout(main.title = "Statistically Signifcant LISA Map of Number of Trips Per Capita for Trip Destinations",
main.title.position = "center",
main.title.size = 1,
main.title.fontface = "bold",
legend.title.size = 0.8,
legend.text.size = 0.8,
legend.hist.size = 0.8,
legend.outside = TRUE,
legend.outside.position = "right",
frame = TRUE) +
tm_borders(alpha = 0.5) +
tm_compass(type = "8star", text.size = 1, size = 2, position = c("RIGHT", "TOP")) +
tm_scale_bar(position = c("LEFT", "BOTTOM"), text.size = 0.5) +
tm_grid(alpha = 0.1)
tmap_arrange(lisa_cat_origin_sig, lisa_cat_dest_sig, asp=1, nrow=2)
Code
lisa_rook_origin_sig <- lisa_rook_origin_trips_capita %>%
filter(p_ii_sim < 0.05)
lisa_cat_origin_sig <- tm_shape(lisa_rook_origin_trips_capita) +
tm_polygons() +
tm_borders(col = "black", alpha = 0.6) +
tm_shape(lisa_rook_origin_sig) +
tm_polygons("mean",
palette = c("lightblue1", "#ec9a64","green3", "#d21b1c"),
title = "LISA Classification",
midpoint = NA,
legend.hist = TRUE,
legend.is.portrait = TRUE,
legend.hist.z = 0.1) +
tm_layout(main.title = "Statistically Significant LISA Map of Number of Trips Per Capita for Trip Origins",
main.title.position = "center",
main.title.size = 1,
main.title.fontface = "bold",
legend.title.size = 0.8,
legend.text.size = 0.8,
legend.hist.size = 0.8,
legend.outside = TRUE,
legend.outside.position = "right",
frame = TRUE) +
tm_borders(alpha = 0.5) +
tm_compass(type = "8star", text.size = 1, size = 2, position = c("RIGHT", "TOP")) +
tm_scale_bar(position = c("LEFT", "BOTTOM"), text.size = 0.5) +
tm_grid(alpha = 0.1)
lisa_rook_dest_sig <- lisa_rook_dest_trips_capita %>%
filter(p_ii_sim < 0.05)
lisa_cat_dest_sig <- tm_shape(lisa_rook_dest_trips_capita) +
tm_polygons() +
tm_borders(col = "black", alpha = 0.6) +
tm_shape(lisa_rook_dest_sig) +
tm_polygons("mean",
palette = c("lightblue1", "#ec9a64","green3", "#d21b1c"),
title = "LISA Classification",
midpoint = NA,
legend.hist = TRUE,
legend.is.portrait = TRUE,
legend.hist.z = 0.1) +
tm_layout(main.title = "Statistically Significant LISA Map of Number of Trips Per Capita for Trip Destinations",
main.title.position = "center",
main.title.size = 1,
main.title.fontface = "bold",
legend.title.size = 0.8,
legend.text.size = 0.8,
legend.hist.size = 0.8,
legend.outside = TRUE,
legend.outside.position = "right",
frame = TRUE) +
tm_borders(alpha = 0.5) +
tm_compass(type = "8star", text.size = 1, size = 2, position = c("RIGHT", "TOP")) +
tm_scale_bar(position = c("LEFT", "BOTTOM"), text.size = 0.5) +
tm_grid(alpha = 0.1)
tmap_arrange(lisa_cat_origin_sig, lisa_cat_dest_sig, asp=1, nrow=2)
Here’s the interactive version of the static plots above for more readability using tmap_mode('view'). Each district, when hovered above, shows the district name and LISA category, unless it is not statistically significant then it will just show the district name.
#| warning: false
tmap_mode("view")tmap mode set to interactive viewing
Code
# First map: Overall LISA Spatial Autocorrelation
lisa_queen_origin <- lisa_queen_origin_trips_capita %>%
mutate(label = paste("District:", district, "| ", mean))
lisa_cat <- tm_shape(lisa_queen_origin) +
tm_polygons("mean",
palette = c("lightblue1", "#ec9a64", "green3", "#d21b1c"),
title = "Overall LISA Classification",
id = "label")
# Second map: Statistical Significant LISA Map
lisa_queen_origin_sig <- lisa_queen_origin_trips_capita %>%
filter(p_ii_sim < 0.05) %>%
mutate(label = paste("District:", district, "| ", mean))
lisa_cat_sig <- tm_shape(lisa_queen_origin) +
tm_polygons(id = "") +
tm_borders(col = "black", alpha = 0.6) +
tm_shape(lisa_queen_origin_sig) +
tm_polygons("mean",
palette = c("lightblue1", "#ec9a64", "green3", "#d21b1c"),
title = "Significant LISA Classification",
id = "label")
tmap_arrange(lisa_cat, lisa_cat_sig, asp = 1, ncol = 2)Code
tmap_mode("plot")tmap_mode("view")tmap mode set to interactive viewing
Code
# First map: Overall LISA Spatial Autocorrelation
lisa_queen_dest <- lisa_queen_dest_trips_capita %>%
mutate(label = paste("District:", district, "| ", mean))
lisa_cat <- tm_shape(lisa_queen_dest) +
tm_polygons("mean",
palette = c("lightblue1", "#ec9a64", "green3", "#d21b1c"),
title = "Overall LISA Classification",
id = "label")
# Second map: Statistical Significant LISA Map
lisa_queen_dest_sig <- lisa_queen_dest_trips_capita %>%
filter(p_ii_sim < 0.05) %>%
mutate(label = paste("District:", district, "| ", mean))
lisa_cat_sig <- tm_shape(lisa_queen_dest) +
tm_polygons(id = "") +
tm_borders(col = "black", alpha = 0.6) +
tm_shape(lisa_queen_dest_sig) +
tm_polygons("mean",
palette = c("lightblue1", "#ec9a64", "green3", "#d21b1c"),
title = "Significant LISA Classification",
id = "label")
tmap_arrange(lisa_cat, lisa_cat_sig, asp = 1, ncol = 2)Code
tmap_mode("plot")tmap_mode("view")tmap mode set to interactive viewing
Code
# First map: Overall LISA Spatial Autocorrelation
lisa_rook_origin <- lisa_rook_origin_trips_capita %>%
mutate(label = paste("District:", district, "| ", mean))
lisa_cat <- tm_shape(lisa_rook_origin) +
tm_polygons("mean",
palette = c("lightblue1", "#ec9a64", "green3", "#d21b1c"),
title = "Overall LISA Classification",
id = "label")
# Second map: Statistical Significant LISA Map
lisa_rook_origin_sig <- lisa_rook_origin_trips_capita %>%
filter(p_ii_sim < 0.05) %>%
mutate(label = paste("District:", district, "| ", mean))
lisa_cat_sig <- tm_shape(lisa_rook_origin) +
tm_polygons(id = "") +
tm_borders(col = "black", alpha = 0.6) +
tm_shape(lisa_rook_origin_sig) +
tm_polygons("mean",
palette = c("lightblue1", "#ec9a64", "green3", "#d21b1c"),
title = "Significant LISA Classification",
id = "label")
tmap_arrange(lisa_cat, lisa_cat_sig, asp = 1, ncol = 2)Code
tmap_mode("plot")tmap_mode("view")tmap mode set to interactive viewing
Code
# First map: Overall LISA Spatial Autocorrelation
lisa_rook_dest <- lisa_rook_dest_trips_capita %>%
mutate(label = paste("District:", district, "| ", mean))
lisa_cat <- tm_shape(lisa_rook_dest) +
tm_polygons("mean",
palette = c("lightblue1", "#ec9a64", "green3", "#d21b1c"),
title = "Overall LISA Classification",
id = "label")
# Second map: Statistical Significant LISA Map
lisa_rook_dest_sig <- lisa_rook_dest_trips_capita %>%
filter(p_ii_sim < 0.05) %>%
mutate(label = paste("District:", district, "| ", mean))
lisa_cat_sig <- tm_shape(lisa_rook_dest) +
tm_polygons(id = "") +
tm_borders(col = "black", alpha = 0.6) +
tm_shape(lisa_rook_dest_sig) +
tm_polygons("mean",
palette = c("lightblue1", "#ec9a64", "green3", "#d21b1c"),
title = "Significant LISA Classification",
id = "label")
tmap_arrange(lisa_cat, lisa_cat_sig, asp = 1, ncol = 2)Code
tmap_mode("plot")